

Databricks and Google Cloud Dataflow compete in the cloud-based data management and analytics category. Databricks appears to have the upper hand due to its robust machine learning integration capabilities and flexible deployment options, while Google Cloud Dataflow is better integrated with the Google Cloud ecosystem and offers cost-effective solutions.
Features: Databricks handles large-scale analytics efficiently with advanced machine learning integration, supports multiple languages like Python, R, and Scala, and provides collaboration through notebooks and Delta Lake format. Google Cloud Dataflow offers strong integration within the Google ecosystem, leverages Apache Beam for both batch and streaming processing, and provides flexibility in programming language support.
Room for Improvement: Databricks should enhance visualization capabilities, improve integration features, and simplify model scoring and monitoring. Additionally, improvements in predictive analysis libraries and clearer error messages are necessary. Google Cloud Dataflow could benefit from better error logging, reduced startup time for jobs, and improved technical support, as well as enhanced integration with IT data flow and a more accessible setup process.
Ease of Deployment and Customer Service: Databricks supports versatile deployment on multiple cloud platforms like Azure and AWS, offering flexibility and favorably rated response times, though technical support could be improved. Google Cloud Dataflow excels in Google Cloud Platform integration, noted for its clear documentation, though some users have faced occasional support challenges.
Pricing and ROI: Databricks is perceived as expensive, particularly for small and mid-sized clusters, but delivers good ROI due to high efficiency and scalability. Google Cloud Dataflow is considered a cost-effective alternative, offering flexible pricing based on compute resources and usage patterns, making it a notable advantage in user reviews.
This reduction in both time and money resulted in real-time impact and significant cost savings.
For a lot of different tasks, including machine learning, it is a nice solution.
When it comes to big data processing, I prefer Databricks over other solutions.
Whenever we reach out, they respond promptly.
As of now, we are raising issues and they are providing solutions without any problems.
I rate the technical support as fine because they have levels of technical support available, especially partners who get really good support from Databricks on new features.
The fact that no interaction is needed shows their great support since I don't face issues.
Google's support team is good at resolving issues, especially with large data.
Whenever we have issues, we can consult with Google.
The sky's the limit with Databricks.
The patches have sometimes caused issues leading to our jobs being paused for about six hours.
Databricks is an easily scalable platform.
Google Cloud Dataflow has auto-scaling capabilities, allowing me to add different machine types based on pace and requirements.
As a team lead, I'm responsible for handling five to six applications, but Google Cloud Dataflow seems to handle our use case effectively.
Google Cloud Dataflow can handle large data processing for real-time streaming workloads as they grow, making it a good fit for our business.
They release patches that sometimes break our code.
Although it is too early to definitively state the platform's stability, we have not encountered any issues so far.
Databricks is definitely a very stable product and reliable.
I have not encountered any issues with the performance of Dataflow, as it is stable and backed by Google services.
The job we built has not failed once over six to seven months.
The automatic scaling feature helps maintain stability.
Adjusting features like worker nodes and node utilization during cluster creation could mitigate these failures.
We prefer using a small to mid-sized cluster for many jobs to keep costs low, but this sometimes doesn't support our operations properly.
We use MLflow for managing MLOps, however, further improvement would be beneficial, especially for large language models and related tools.
Outside of Google Cloud Platform, it is problematic for others to use it and may require promotion as an actual technology.
Dealing with a huge volume of data causes failure due to array size.
I would like to see improvements in consistency and flexibility for schema design for NoSQL data stored in wide columns.
It is not a cheap solution.
I believe that in terms of credits for Databricks, we're spending between £15,000 and £20,000 a month.
It is part of a package received from Google, and they are not charging us too high.
Databricks' capability to process data in parallel enhances data processing speed.
The platform allows us to leverage cloud advantages effectively, enhancing our AI and ML projects.
The Unity Catalog is for data governance, and the Delta Lake is to build the lakehouse.
It supports multiple programming languages such as Java and Python, enabling flexibility without the need to learn something new.
The integration within Google Cloud Platform is very good.
Google Cloud Dataflow's features for event stream processing allow us to gain various insights like detecting real-time alerts.
| Product | Market Share (%) |
|---|---|
| Databricks | 9.5% |
| Google Cloud Dataflow | 4.2% |
| Other | 86.3% |


| Company Size | Count |
|---|---|
| Small Business | 26 |
| Midsize Enterprise | 12 |
| Large Enterprise | 56 |
| Company Size | Count |
|---|---|
| Small Business | 3 |
| Midsize Enterprise | 2 |
| Large Enterprise | 10 |
Databricks offers a scalable, versatile platform that integrates seamlessly with Spark and multiple languages, supporting data engineering, machine learning, and analytics in a unified environment.
Databricks stands out for its scalability, ease of use, and powerful integration with Spark, multiple languages, and leading cloud services like Azure and AWS. It provides tools such as the Notebook for collaboration, Delta Lake for efficient data management, and Unity Catalog for data governance. While enhancing data engineering and machine learning workflows, it faces challenges in visualization and third-party integration, with pricing and user interface navigation being common concerns. Despite needing improvements in connectivity and documentation, it remains popular for tasks like real-time processing and data pipeline management.
What features make Databricks unique?
What benefits can users expect from Databricks?
In the tech industry, Databricks empowers teams to perform comprehensive data analytics, enabling them to conduct extensive ETL operations, run predictive modeling, and prepare data for SparkML. In retail, it supports real-time data processing and batch streaming, aiding in better decision-making. Enterprises across sectors leverage its capabilities for creating secure APIs and managing data lakes effectively.
We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.