Try our new research platform with insights from 80,000+ expert users
it_user365301 - PeerSpot reviewer
Software Developer (Product Engineering) at a computer software company with 501-1,000 employees
Vendor
We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

What is most valuable?

\Spark Streaming, Spark SQL and MLib in that order.

How has it helped my organization?

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

What needs improvement?

Like I said scalability is still an issue, also stability. Spark on Yarn still doesn't seem to have programming submission api, so have to rely on spark-submit script to run jobs on YARN. Scala vs Java API have performance differences which will require sometimes to code in Scala.

What other advice do I have?

Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
reviewer1904019 - PeerSpot reviewer
Chief Technology Officer at a tech services company with 11-50 employees
Real User
Helpful support, easy to use, and high availability
Pros and Cons
  • "The most valuable feature of Apache Spark is its ease of use."
  • "Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."

What is our primary use case?

I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.

What is most valuable?

The most valuable feature of Apache Spark is its ease of use.

What needs improvement?

Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.  

For how long have I used the solution?

I have been using Apache Spark for approximately 18 months.

What do I think about the stability of the solution?

Apache Spark is stable.

What do I think about the scalability of the solution?

We are using Apache Spark across multiple nodes and it is scalable.

We have approximately five people using this solution.

How are customer service and support?

The technical support from Apache Spark is very good.

What other advice do I have?

I rate Apache Spark an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros sharing their opinions.
Updated: June 2025
Buyer's Guide
Download our free Apache Spark Report and get advice and tips from experienced pros sharing their opinions.