We use Starburst with one client who is exploring their ecosystem to remove data silos and enable data access across departments. It's a very big ecosystem, like a finance institute. They are currently exploring Starburst, and it's all in the research phase. We haven't used it in production yet.
What is our primary use case?
How has it helped my organization?
It's all internal to Starburst's query optimization platform. They check the data volume, how it can be optimized, what pushdown factors to apply, and so on. It's all technical and internal to the system.
I've connected to databases and Salesforce, among other things. I haven't personally tried MongoDB or NoSQL databases, but they are supported in the Starburst ecosystem. I have definitely tried relational databases like Neo4j and others.
What is most valuable?
Starburst has changed the eco-system where you don't need to move the data, saving storage costs. It provides unified, virtual data access, allowing you to perform operations directly on the source. There's also parallel computing, data optimization, and cost optimization.
Primarily, the idea of not moving data from source to target and avoiding duplication is a major benefit of Starburst. You can connect to any data source from any region and have a unified view of data across your organization. There might be some latency, but it's worth it for the unified access.
Security and data governance are addressed through Immuta or Ranger policies. ABAC (Attribute-Based Access Control), PBAC (Policy-Based Access Control), and RBAC (Role-Based Access Control) are all supported. We primarily use Ranger.
What needs improvement?
There are no specific projects supported by Starburst regarding AI initiatives or machine learning projects.
In the future, if we have all the data available, we can definitely capitalize on AI/ML and LLM capabilities to summarize data and gain insights. That's our future goal, but we haven't reached that point yet.
There should be support for REST API data sources to access data from the web. We often have data coming in and communicate with data sources via REST API calls. I don't see that capability in Starburst currently; everything is through JDBC or ODBC. If Starburst could seamlessly access data using REST API capabilities, it would be a game-changer.
The self-service data management features, like self-service materialized views, are great, but they can be a bit complex for basic users to understand.
For how long have I used the solution?
I have been using it for one year.
What do I think about the stability of the solution?
I haven't personally encountered any performance issues unless there's a cluster computing problem. If there are issues with the compute resources, like not enough nodes or memory, then obviously there will be performance problems. Sometimes it happens because of a poorly written query.
For example, if you have multiple systems and join them with a bad query, the process will keep running without returning any results. That's not a system problem; it's a user problem, and users will experience long wait times.
What do I think about the scalability of the solution?
Our customer's environment has around 200 to 300 users.
How are customer service and support?
We are connected with Starburst directly because we are working with them as a partner. We have direct communication through a Slack channel, and they are always there to help if we encounter any challenges or technical problems.
Starburst is a new product. Everyone is very nice and knowledgeable. Obviously, everyone has their own bandwidth, but mostly, we get help on time and get things done.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
There is another product called Dremio. The difference between Dremio and Starburst is that Starburst needs a whole cluster restart when we need to add any new catalog. If we need to add a new database or data source, we have to configure the entire ecosystem, including YAML files, and then deploy it. Once deployed, we have to restart the whole server for the changes to take effect. That's a very bad architecture, in my opinion.
In Dremio, you can configure things on the fly without needing a restart. You can do it from the front end itself. In Starburst, you have to go to the back end and need support from the infrastructure team. There's no front end for configuring the catalog. I feel they should develop a process to configure catalogs from the front end rather than requiring backend deployment and cluster restarts every time. It takes a lot of time and is not a good architecture.
For data analytics, we primarily use Tableau and Power BI, mainly Tableau. We connect Starburst with Tableau for analysis.
How was the initial setup?
It's Kubernetes-based, so it's a little challenging. Doing it externally for learning purposes was fine, but when users have to deploy in Kubernetes requires a learning curve for Kubernetes, Helm, and related technologies. It's a better-managed and performing setup for Starburst, but definitely challenging. You have to make a lot of custom changes if you need to do things that Starburst doesn't support out of the box.
I worked with the cloud version. I think there's also an on-premises option where they spin up the cluster in their own data center. I haven't done that, but I have experience setting it up in the AWS cloud.
The deployment takes a good amount of time because there are a lot of things to do. It's not just building the Helm Chart. You'll encounter networking and access challenges. It takes a day or two if you're not very familiar with the process.
Maintenance, when it's in the cloud, is handled by the system itself through auto-scaling and the Kubernetes cluster. However, maintenance can sometimes be challenging because clients have specific requirements for their ecosystem, like security or governance, that Starburst might not support out of the box.
Starburst is constantly evolving, but it takes time for older versions to get those features. So, you have to make sure your system works with both the current and future versions of Starburst. It's an ongoing process for any organization.
What about the implementation team?
We were trained on it, so we could handle it ourselves. We had a learning curve, but we received training from Starburst. Now we're training others in the organization.
Which other solutions did I evaluate?
What other advice do I have?
Starburst requires a heavy investment in infrastructure; it needs heavy computing and storage. If your use case doesn't involve heavy processing or storage, then Starburst might not be the solution. It's more suitable for large ecosystems spread across regions, or companies with a massive employee base. In those cases, Starburst sounds good, but Dremio might be better for smaller organizations. I can't compare them directly, but Starburst definitely needs a good infrastructure investment, good infrastructure management, and skilled personnel.
The product is nice, there's no doubt about that. It's very scalable, fast performing, and supports many catalogs and connectors that Dremio doesn't have. Dremio is limited to ten to fifteen connectors, while Starburst supports forty to fifty, so it has a much bigger ecosystem. In that way, Starburst wins.
Between Dremio and Starburst, considering the connectors, Starburst gets a nine out of ten.


