\Spark Streaming, Spark SQL and MLib in that order.
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees
We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.
What is most valuable?
How has it helped my organization?
We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.
What needs improvement?
Like I said scalability is still an issue, also stability. Spark on Yarn still doesn't seem to have programming submission api, so have to rely on spark-submit script to run jobs on YARN. Scala vs Java API have performance differences which will require sometimes to code in Scala.
What other advice do I have?
Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Chief Technology Officer at a tech services company with 11-50 employees
Helpful support, easy to use, and high availability
Pros and Cons
- "The most valuable feature of Apache Spark is its ease of use."
- "Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
What is our primary use case?
I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.
What is most valuable?
The most valuable feature of Apache Spark is its ease of use.
What needs improvement?
Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.
For how long have I used the solution?
I have been using Apache Spark for approximately 18 months.
What do I think about the stability of the solution?
Apache Spark is stable.
What do I think about the scalability of the solution?
We are using Apache Spark across multiple nodes and it is scalable.
We have approximately five people using this solution.
How are customer service and support?
The technical support from Apache Spark is very good.
What other advice do I have?
I rate Apache Spark an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros
sharing their opinions.
Updated: June 2025
Popular Comparisons
Spring Boot
Jakarta EE
AWS Lambda
Amazon EMR
AWS Fargate
Cloudera Distribution for Hadoop
Apache NiFi
AWS Batch
Amazon EC2 Auto Scaling
Vert.x
Amazon EC2
Spring MVC
Spark SQL
Amazon Corretto
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which is the best RDMBS solution for big data?
- Apache Spark without Hadoop -- Is this recommended?
- Which solution has better performance: Spring Boot or Apache Spark?
- AWS EMR vs Hadoop
- Handling real and fast data - how do BigInsight and other solutions perform?
- When evaluating Hadoop, what aspect do you think is the most important to look for?
- Should we choose InfoSphere BigInsights or Cloudera?