Try our new research platform with insights from 80,000+ expert users
it_user371325 - PeerSpot reviewer
Data Scientist at a tech vendor with 10,001+ employees
Vendor
It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

What is most valuable?

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

How has it helped my organization?

We're able to perform data discovery on large datasets without too much difficulty.

What needs improvement?

It needs better documentation as well as examples for all the Spark libraries. That would be very helpful in maximizing its capabilities and results.

For how long have I used the solution?

I've used it for over nine months now.

Buyer's Guide
Apache Spark
August 2025
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: August 2025.
865,295 professionals have used our research since 2012.

What was my experience with deployment of the solution?

I haven't encountered any issues with deployment.

What do I think about the stability of the solution?

There have been no stability issues.

What do I think about the scalability of the solution?

I haven't had any scalability issues. It scales better than Python and R.

How are customer service and support?

Customer Service:

I haven't had to use customer service.

Technical Support:

I haven't had to use technical support.

Which solution did I use previously and why did I switch?

I previously used Python and R, but neither of these scaled particularly well.

How was the initial setup?

The initial setup was complex. It was not easy getting the correct version and dependencies set up.

What about the implementation team?

I implemented it in-house on my own!

What was our ROI?

It's open-source, so ROI is inapplicable.

What other advice do I have?

Learn Scala as this will greatly reduce the pain in starting off with Spark.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
it_user365301 - PeerSpot reviewer
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees
Vendor
We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

Valuable Features:

\Spark Streaming, Spark SQL and MLib in that order.

Improvements to My Organization:

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

Room for Improvement:

Like I said scalability is still an issue, also stability. Spark on Yarn still doesn't seem to have programming submission api, so have to rely on spark-submit script to run jobs on YARN. Scala vs Java API have performance differences which will require sometimes to code in Scala.

Other Advice:

Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Apache Spark
August 2025
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: August 2025.
865,295 professionals have used our research since 2012.
reviewer1904019 - PeerSpot reviewer
Chief Technology Officer at a tech services company with 11-50 employees
Real User
Helpful support, easy to use, and high availability
Pros and Cons
  • "The most valuable feature of Apache Spark is its ease of use."
  • "Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."

What is our primary use case?

I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.

What is most valuable?

The most valuable feature of Apache Spark is its ease of use.

What needs improvement?

Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.  

For how long have I used the solution?

I have been using Apache Spark for approximately 18 months.

What do I think about the stability of the solution?

Apache Spark is stable.

What do I think about the scalability of the solution?

We are using Apache Spark across multiple nodes and it is scalable.

We have approximately five people using this solution.

How are customer service and support?

The technical support from Apache Spark is very good.

What other advice do I have?

I rate Apache Spark an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros sharing their opinions.
Updated: August 2025
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros sharing their opinions.