What is our primary use case?
My main use case for
Starburst Galaxy is querying petabytes of data across vast data sources, and I use a federated query engine to join data sources from different databases and then join them using
Starburst Galaxy.
I have different data sources, including Oracle, DB2, and a MongoDB cluster, so I join all of these data sources using Starburst Galaxy with the federated querying feature. I transform that into Iceberg using Starburst Galaxy, land it in S3 storage, convert it into Iceberg tables, and then use them for dashboarding in Power BI or Tableau.
What is most valuable?
Starburst Galaxy offers me several best features, which include very fast querying results, automatic indexing of data for long tables, a cost-based optimizer which reduces the time to query large tables, and an agentic feature that lets me talk to my data.
I find myself relying most on querying from different databases as well as automatic indexing in my day-to-day work, as I am a data science architect who needs to get the queries in a very short period of time. Starburst Galaxy serves the best purpose for me because if my SLAs are not met with my customers, they will raise a case, and I have tried many other tools, but Starburst Galaxy fits the best.
Starburst Galaxy has positively impacted my organization since we were struggling with Denodo and Dremio, which had their own features but were not helpful in querying large amounts of data, especially semi-structured or unstructured data. Starburst Galaxy addresses this with many YAML files and manifest files for automated maintenance, and it helps reduce the small file problem in different HDFS systems. Additionally, Starburst Galaxy has an MCP server that connects to various agentic pipelines, reducing the time to market for data consumption.
What needs improvement?
Starburst Galaxy can be improved by discovering unstructured data and building in streaming ingestion because we are currently using Kafka for that purpose. We rely on third-party tools for ingesting the streaming files, and I see they are integrating with MCP and agentic pipelines. Including these features would make Starburst Galaxy a much better tool.
For how long have I used the solution?
I have been using Starburst Galaxy for the last four years.
What do I think about the stability of the solution?
Starburst Galaxy is stable.
What do I think about the scalability of the solution?
Starburst Galaxy's scalability is excellent as it can easily scale up to large clusters, with many nodes configured for the amount of data ingested daily, allowing it to handle petabytes of data efficiently.
How are customer service and support?
Customer support is quite good; we faced issues with complex queries and reached out to Starburst, and they were very helpful in troubleshooting those issues.
Which solution did I use previously and why did I switch?
We previously used
Dremio, but it was not fast and had a lot of limitations in processing large unstructured and structured data, leading us to switch to Starburst Galaxy.
How was the initial setup?
The experience with pricing, setup cost, and licensing was straightforward, as I could easily purchase the licensing in the
AWS Marketplace. For on-premises, we contacted Starburst for licensing costs, and the setup was quite easy—a one-click setup that allows us to discover the catalogs for querying. The licensing cost was reasonable compared to the value we receive from Starburst Galaxy, making it a good product overall.
What was our ROI?
I have seen a return on investment; the quick dashboarding allows me to publish dashboards and data products, saving significant time for publishing to different endpoints and translating into a cost savings of around $500,000, which we previously spent on getting reports published due to the reduced turnaround time and our capability to serve many more customers than before.
Which other solutions did I evaluate?
We did not evaluate other options before choosing Starburst Galaxy; we directly went to it.
What other advice do I have?
I would advise others looking into using Starburst Galaxy to consider it one of the best tools in the market, especially for ingesting large datasets and federating queries across different data sources. Starburst Galaxy is the best engine currently available, and they should try it for themselves to see the difference in query times. My overall review rating is 8 out of 10.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?