Try our new research platform with insights from 80,000+ expert users
it_user1322136 - PeerSpot reviewer
Pre-sale Leader, Big Data Enterprise Solutions at a tech services company with 1,001-5,000 employees
Consultant
Apr 14, 2020
Easy to load and query data with SQL support, but it is difficult to deploy and the interface could be improved
Pros and Cons
  • "The most valuable feature is the ability to use SQL directly with Databricks."
  • "I have seen better user interfaces, so that is something that can be improved."

What is our primary use case?

My division works with Big Data and Data Science, and Databricks is one of the tools for Big Data that we work with. We are partners with Microsoft and we began working with this solution for one specific project in the financial industry.

What is most valuable?

The most valuable feature is the ability to use SQL directly with Databricks. That is the most relevant thing for my current project.

After deployment, it is easy to load files and query data.

What needs improvement?

I have seen better user interfaces, so that is something that can be improved.

It was quite hard to deploy.

For how long have I used the solution?

I have been using Databricks for about one year.

Buyer's Guide
Databricks
January 2026
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
881,665 professionals have used our research since 2012.

What do I think about the stability of the solution?

We have not found any bugs yet, although it is only the beginning of our work. I do not have enough information to say for sure.

What do I think about the scalability of the solution?

We have about 200 employees but it is only a small group using Databricks. We are at the beginning so scaling is not something we have had to do.

How are customer service and support?

We have not had to contact technical support because we are Microsoft partners and I am calling a colleague of mine who is helping me directly.

Which solution did I use previously and why did I switch?

I have used Snowflake and one of the differences is that Snowflake is much easier to deploy.

How was the initial setup?

The first deployment is difficult. It is not straightforward and you have to think about a lot of stuff. It is not really like a SaaS deployment and there are a lot of steps that you have to take.

What about the implementation team?

We have our own team, which includes colleagues from Microsoft. Because the current project is a large client, they would like to see this project succeed.

What's my experience with pricing, setup cost, and licensing?

We find Databricks to be very expensive, although this improved when we found out how to shut it down at night.

What other advice do I have?

Our client is a bank and some of the information can be shared outside of the organization, whereas some of the data is confidential and private. Using a purely on-premises solution would have made it more difficult to share information with the outside, which is one of the reasons that they wanted a cloud-based deployment. 

My advice for anybody who is considering this solution is that it is very good for unstructured or semi-structured data. If, however, you have structured data then I would recommend a columnar database like Snowflake or Vertica. These solutions are easier to deploy.

This is a good solution that is working well, but I don't think that it is really a SaaS.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer. Implementer
PeerSpot user
Tristan Bergh - PeerSpot reviewer
Data Scientist at a computer software company with 501-1,000 employees
Real User
Mar 16, 2020
Good built-in optimization, easy to use with a great user interface
Pros and Cons
  • "The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly."
  • "The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment."

What is our primary use case?

We are using this solution to run large analytics queries and prepare datasets for SparkML and ML using PySpark.

We ran on multiple clusters set up for a minimum of three and a maximum of nine nodes having 16GB RAM each.

For one ad hoc requirement, a 32-node cluster was required.

Databricks clusters were set for autoscaling and to time out after forty minutes of inactivity. Multiple users attached their notebooks to a cluster. When some workloads required different libraries, a dedicated cluster was spun up for that user.

How has it helped my organization?

Databricks took care of all the underlying cluster management seamlessly. We could configure our clusters to run and deliver results without any delays due to hardware configuration or installation issues.

Databricks allowed us to go from non-existent insights (because the datasets were just too large) to immediate and rich insights once the datasets were ingested into our PySpark notebooks.

What is most valuable?

Immense ease in running very large scale analytics, with a convenient and slick UI. This saved us from having to tweak, tune, dive into deeper abstractions, get involved in procurement, and also having to wait for other workloads to run.

The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly. 

The Delta data format proved excellent. Databricks had already done the heavy lifting and optimized the format for large scale interactive querying. They saved us a lot of time.

What needs improvement?

The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment. Perhaps a few connectors that auto-deploy to a reporting server?

More parallelized Machine Learning libraries would be excellent for predictive analytics algorithms.

For how long have I used the solution?

I have been using this solution for three years.

What do I think about the stability of the solution?

This solution is stable and proved very robust. When very obvious programmatic recommendations were not followed, causing memory overruns on a driver, the clusters required restarting.

What do I think about the scalability of the solution?

Absolutely, seamlessly, and massively scalable, within only budgetary limits. Also, the product itself offers real-time efficiency and optimization recommendations. 

How are customer service and support?

So brilliant, it was never required. Their documentation is comprehensive, clear, simple, and thorough. 

Which solution did I use previously and why did I switch?

Previously I used Hive and Livy in Zeppelin on an in-house Hadoop installation. The queries constantly threw exceptions and timeouts and the necessary configuration changes proved time-consuming and problematic. Databricks, on the other hand, simply made all those problems vanish. 

How was the initial setup?

Setup and Support are single-click.

What about the implementation team?

We used an in-house team for implementation.

What was our ROI?

Our ROI was of the order of USD $75k per year for one deployment. We were able to switch our workloads from an onsite Hadoop cluster, billed to our department for more than USD $100k per year, to a Databricks workspace in the cloud for a quarter of that expenditure. 

Further, we were able to transparently and efficiently scale our queries to run under fifteen minutes per major analytics use case, while being subject to unstable queries and highly brittle data flow use cases from the in-house Hadoop cluster.

We are further reducing spending on our traditional RDBMS solution by offloading reporting workloads to the Databricks PySpark notebooks, which is reducing our expensive datacenter resources and freeing up RDBMS resources for OLTP loads. 

What's my experience with pricing, setup cost, and licensing?

Set up a cluster in your cloud of choice, but Databricks' service might also be very competitive as their pricing units will be built in. 

Licensing on site I would counsel against, as on-site hardware issues tend to really delay and slow down delivery.

Which other solutions did I evaluate?

I evaluated Hortonworks, Livy, and Zeppelin. These were unsuitable due to the unavailability of sufficiently skilled personnel.

What other advice do I have?

By investing in people skilled in data querying, Python coding, and even basic Data Science, a Databricks setup will reward the business. 

Once the Databricks data flows are established, it is a matter of a few incremental steps to opening up streaming and running up-to-the-minute queries, allowing the business to build its data-driven processes. 

Databricks continues to advance the state-of-the-art and will be my go-to choice for mission-critical PySpark and ML workflows. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Databricks
January 2026
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
881,665 professionals have used our research since 2012.
reviewer1270416 - PeerSpot reviewer
Vice President, Business Intelligence and Analytics at a tech services company with 10,001+ employees
Real User
Feb 6, 2020
Stable cloud platform for data engineering and has a straightforward setup
Pros and Cons
  • "I haven't heard about any major stability issues. At this time I feel like it's stable."
  • "Pricing is one of the things that could be improved."

What is our primary use case?

We are still exploring the solution. We utilize it much, much better than their star schema models that they are trying to replace it with. We bring in Databricks and then see how they can leverage the additional analytical functionalities around the Databricks cloud. It's more in exploratory ways. We recommend Databricks, especially with the Azure cloud frameworks.

What needs improvement?

Pricing is one of the things that could be improved.

Also, there could be improvement in the visual analytics space there and on the machine learning functions. I haven't explored so I don't know about the functions and features that are there. If it is not there, then I think that's something which they should consider including.

For how long have I used the solution?

My team has been exploring Databricks for close to five or six months.

What do I think about the stability of the solution?

I haven't heard about any major stability issues. At this time I feel like it's stable.

What do I think about the scalability of the solution?

In terms of scalability, I think once we put it across for larger use-cases the scalability question will really arise. So we'll need detailed information. I assume that we will be able to scale up.

I think we do not have more than 10 people working on it now. Because we are in the earlier stages of implementation, it's more like a POC now. I really don't know whether it's been open for the larger audience yet.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

It is better to be installed with the help of integrators, or consultants, or with an experienced team.

What other advice do I have?

It's more data scientists using Databricks. I would call them power users trying to see how they can get a hand on it, though they are not data scientists. They try to understand it a little bit better for their future use.

On a scale of one to ten, I would rate it an eight, easy. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
reviewer1276107 - PeerSpot reviewer
Engineer at a tech services company with 10,001+ employees
Real User
Feb 4, 2020
An easy initial setup with a good time travel feature, but needs better model scoring
Pros and Cons
  • "The time travel feature is the solution's most valuable aspect."
  • "Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with."

What is our primary use case?

We use the solution for multiple items. We use lots of data crunching, development, and algorithms on it.

What is most valuable?

The time travel feature is the solution's most valuable aspect.

What needs improvement?

The management of the solution needs to be modernized. Managing the radius data is hard.

The solution requires modern scoring. There's not a good way of knowing how the models are performing from a data science perspective. The solution needs more model scoring abilities. It doesn't necessarily need more model monitoring, but more model scoring and performance from a data science perspective. 

Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with.

For how long have I used the solution?

I've been using the solution for one year so far.

What do I think about the stability of the solution?

The solution is not exactly stable. We've faced a few bugs which have really affected it. There are bugs especially when it comes to connecting with Spark.

What do I think about the scalability of the solution?

It's hard to say how scalable the solution is. The scalability comes into play on the Spark side, not on the Databricks side.

We have about 20 people on the solution right now.

How are customer service and technical support?

We've never been in touch with technical support, so I don't have any experience in terms of dealing with them.

How was the initial setup?

The initial setup is straightforward. I wouldn't say that it's complex in any way.

Deployment times vary and really depend on multiple factors. It can take anywhere from a few weeks to a few months to deploy the solution. In our case, it took us about three months to fully deploy it.

It takes two to three people to deploy the solution.

What about the implementation team?

I deployed the solution with the help of my team.

What's my experience with pricing, setup cost, and licensing?

I'm not sure what the licensing costs are on the solution.

Which other solutions did I evaluate?

We did evaluate Amazon PageMaker before ultimately choosing Databricks. It's the only other solution we evaluated at the time.

What other advice do I have?

We're partners with Databricks.

We're using the latest version of the solution, but I can't recall what version number we are on.

I'd advise others considering the solution to look at usage. They shouldn't adopt the solution blindly. How the implementation and usage will go will depend on the skill of the data engineer and what your requirements are.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Data Science Consultant at a tech services company with 501-1,000 employees
Consultant
Jan 8, 2020
Good performance, easy to set up, and easy to use if you have a Python background
Pros and Cons
  • "I work in the data science field and I found Databricks to be very useful."
  • "It would be very helpful if Databricks could integrate with platforms in addition to Azure."

What is our primary use case?

We are building internal tools and custom models for predictive analysis. We are currently building a platform where we can integrate multiple data sources, such as data that is coming from Azure, AWS, or any SQL database. We integrate the data and run our models on top of that.

We primarily use Databricks for data processing and for SQL databases.

What is most valuable?

I found that PySpark is the most useful tool. It uses in-memory calculation and when you want to run a model it does it very quickly. We used to use Python and when we migrated to PySpark the performance was much better.

What needs improvement?

It would be very helpful if Databricks could integrate with platforms in addition to Azure.

Having an open-source version or having the option to get a trial version of Databricks would be very helpful.

It would be very useful for beginners if there were tutorials and examples on how to write code for PySpark, R, or Scala. Having examples would give people something to refer to and play with.

For how long have I used the solution?

We have been using Databricks for the past two or three years.

What do I think about the stability of the solution?

A couple of times I faced an issue where a long-running process was consuming a lot of time and then stopped abruptly. It necessitated starting the process again.

What do I think about the scalability of the solution?

We are in the prototyping stage so we do not plan on increasing our usage yet.

How are customer service and technical support?

We have not been in contact with technical support.

Which solution did I use previously and why did I switch?

Before using Databricks, we were running our own cluster with a web server that executed our Python queries.

How was the initial setup?

The initial setup is straightforward. With respect to deployment, the development can be done within half an hour and we can use code and deploy from there.

What about the implementation team?

We implemented Databricks on our own. We haven't deployed as such, as we are just running our queries and it is not in production yet.

What other advice do I have?

I work in the data science field and I found Databricks to be very useful. If I want to run any models then I can code them in PySpark. If you are coming from a Python background then you can write code in PySpark and it runs quickly. This is a good solution in terms of performance. 

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
it_user1235523 - PeerSpot reviewer
Machine Learning Engineer at a tech vendor with 51-200 employees
Real User
Dec 25, 2019
A convenient notebook, good stability, and a straightforward setup
Pros and Cons
  • "The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient."
  • "The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets."

What is our primary use case?

We primarily use the solution to run current jobs; to run the spark jobs as the current job.

What is most valuable?

The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient.

What needs improvement?

The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets.

The support could be improved a bit around the database. When we stream it to Data Lake, some data cannot be loaded. It should be a priority to fix this.

For how long have I used the solution?

I've been using the solution for half a year.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. However, it still needs us to manually set out the number of nodes in a cluster. It's really dependent on the application. Sometimes, when the tasks are bigger, and it gets a little difficult for us to define the number of nodes in a cluster. If the solution could allow users to set up the clusters, I think that'll be good.

Currently, we have three people using the solution. We may increase usage in the future.

How are customer service and technical support?

The technical support is quite good. In the beginning, when we had a few POC projects, they were very supportive.

Which solution did I use previously and why did I switch?

We didn't previously use a different solution, however, we built our own from scratch. This is the first unified platform that we've used.

How was the initial setup?

The initial setup is very straightforward. We just use their job functions. To deploy as a spark job is quite straightforward. 

In our use case, we also had some external databases to handle the deployment. For example, we only generated some prediction results. We saved the results into an external database. The solution takes time to deploy to the external database, but the spark job is quite easy.

What other advice do I have?

I'm a software development engineer. I'm working with the latest version.

As long as the developers have an understanding of spark, and understanding technical tricks, it's very fast in terms of using the database.

I'd rate the solution eight out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user1149000 - PeerSpot reviewer
Data Science Developer at a tech services company with 501-1,000 employees
Real User
Dec 12, 2019
Good performance and support for big data, built-in machine learning libraries are powerful
Pros and Cons
  • "Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great."
  • "It should have more compatible and more advanced visualization and machine learning libraries."

What is our primary use case?

We use this solution for streaming analytics. We use machine learning functions that output to the API and work directly with the database.

How has it helped my organization?

Prior to using Azure Databricks in the cloud, we had Databricks installed in clusters. Since our implementation, the performance has increased and our cost has been reduced.

What is most valuable?

Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great.

This solution has very good machine learning libraries built-in.

The support for big data is good.

What needs improvement?

Databricks should have more libraries for predictive analysis and machine learning.

It should have more compatible and more advanced visualization and machine learning libraries. As it is now, I have to try a customer algorithm in order for things to be compatible.

I would like to see more deep learning analytics.

For how long have I used the solution?

I have been using Databricks for about one year.

What do I think about the stability of the solution?

This is a cluster-based solution, so it is stable.

What do I think about the scalability of the solution?

We started using Databricks with a small PoC application, and then we developed it into a larger one. It's scalable, and it's a simple process to scale.

We have eight people in our team who are using this solution. We do not plan to increase usage at this time.

How are customer service and technical support?

I did not contact technical support myself, but when one of our team members contacted them they were given good answers. I would say that the support is good.

How was the initial setup?

It is not difficult to deploy this solution because it is well documented. We followed the normal steps that included all of the APIs.

What's my experience with pricing, setup cost, and licensing?

I do not exactly know the costs, but one of our clients pays between $100 USD and $200 USD monthly.

What other advice do I have?

Databricks has been good and I like it. However, it would be improved with the enhancement of the machine learning libraries, and with the inclusion of visualization libraries.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user1146978 - PeerSpot reviewer
Business Intelligence and Analytics Consultant at a tech services company with 201-500 employees
Consultant
Dec 12, 2019
Easy to switch loads between clusters and automation is easy using the API
Pros and Cons
  • "Automation with Databricks is very easy when using the API."
  • "Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems."

What is our primary use case?

I am a developer and I do a lot of consulting using Databricks.

We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.

What is most valuable?

The most valuable feature is the ability to switch loads between multiple clusters.

Automation with Databricks is very easy when using the API.

The ability to write code and SQL in the same interface is useful.

It is easy to connect notebooks to a cluster.

There are a large number of inbuilt functions that help to make things easier.

What needs improvement?

Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems. As it is now, we have to go into the driver logs to identify the error messages properly. 

There is not much information about Databricks available online, such as cost. Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful.

I would like to see integration with Power BI or Tableau for the business users. They may use Databricks to check on things, but it will be a little bit complicated for them. The GUI interfaces for Tableau and Power BI are ones that they are used to, so the integration would help.

For how long have I used the solution?

I have been using Databricks for about five and a half years.

What do I think about the stability of the solution?

We have found that in the development environment, Databricks is pretty stable. We have had problems where something works in development but does not work in production, and this can happen when the version is updated and certain features have been deprecated. This means that more testing is required before moving to production, but this is the only drawback that we have seen.

Basically, when we move across version we have found issues, but otherwise, it's pretty stable.

What do I think about the scalability of the solution?

Scalability is one of the main features of Databricks. We have used datasets that are one hundred megabytes in size up to one terabyte, and we can manage, so it's easily scalable.

We have a large company with between 400 and 500 people using this solution.

How are customer service and technical support?

We have not reached out for technical support on Databricks.

How was the initial setup?

I found the initial setup easy because I had previously worked on Spark.

If somebody goes through the training, which is available on the website, then it should be straightforward. I don't think that it is very hard.

When it comes to developing things based on use cases, it can take between three days and two weeks, plus two to three days for testing and deploying it. I would say that for an entire use case, it will take a maximum of three weeks.

What other advice do I have?

My advice for developers who are interested in working with this solution is to first go through the Spark architecture.

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: January 2026
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.