No more typing reviews! Try our Samantha, our new voice AI agent.

Apache Spark pros and cons

Vendor: Apache

4.2 out of 5

69 reviews
90% willing to recommend

Pros & Cons summary

Apache Spark offers remarkable speed and efficiency in data processing by managing parallel operations and large datasets using an in-memory engine for rapid execution. It efficiently handles extensive datasets with impressive scalability but requires significant technical expertise for setup and lacks support for some machine learning libraries. Complexity in optimization and integration with databases affects its performance. Its user-friendly nature, flexibility, and documentation enhance deployment and adoption across industries.

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.

Prominent pros & cons

PROS

Apache Spark offers exceptional speed and efficiency in data processing, significantly outperforming traditional tools by effectively managing parallel processing and handling large data volumes.

It boasts strong scalability, enabling efficient handling of extensive datasets by distributing workloads across multiple nodes, which enhances performance and flexibility.

The in-memory processing engine is highly valuable, allowing for rapid data handling and processing by utilizing RAM rather than disk storage, which leads to enhanced execution speed.

Apache Spark supports a wide range of machine learning processes with a scalable library, providing valuable functionalities for real-time data processing, streaming, and analytical tasks.

The flexibility and user-friendly nature of Apache Spark make it easy to deploy and integrate seamlessly with existing processes, supported by clear documentation, which increases its adoption in various industries.

CONS

Apache Spark requires significant technical expertise to deploy and run high-tech tools, making it challenging for users without a technical background.

Apache Spark lacks support for certain machine learning libraries, models, and neural network-related algorithms, limiting its use in some applications.

Apache Spark's initial setup and installation are complex, demanding a considerable learning curve for practitioners.

Optimization techniques in Apache Spark have limitations, particularly when handling large data sets, affecting performance and efficiency.

Integration with popular databases and third-party platforms needs improvement, as current support often requires workarounds.

Apache Spark Pros review quotes

Devindra Weerasooriya

Data Architect at Devtech

Nov 20, 2025

Apache Spark, specifically PySpark and the tools available there, have been quite helpful in my event analysis work.

Read full review

ML

Michael Lierheimer

Consultant, Chief Engineer, Teamleiter at infoteam Software AG

Feb 27, 2026

The best features in Apache Spark that I appreciate are the fast database access, the data transformation, and the data exchange.

Read full review

Data Engineer at a tech company with 10,001+ employees

Aug 12, 2025

Apache Spark resolves many problems in the MapReduce solution and Hadoop, such as the inability to run effective Python or machine learning algorithms.

Read full review

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

900,644 professionals have used our research since 2012.

Dunstan Matekenya

Data Scientist at a financial services firm with 10,001+ employees

Jul 10, 2024

Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly.

Read full review

Head of Data at a energy/utilities company with 51-200 employees

Aug 5, 2024

The product's initial setup phase was easy.

Read full review

reviewer2534727

Manager Data Analytics at a outsourcing company with 5,001-10,000 employees

Aug 12, 2024

I like Apache Spark's flexibility the most. Before, we had one server that would choke up. With the solution, we can easily add more nodes when needed. The machine learning models are also really helpful. We use them to predict energy theft and find infrastructure problems.

Read full review

Senior Software Architect at USEReady

Apr 24, 2025

Apache Spark's ability to handle both batch and streaming data is the most valuable feature for me as it offers solid real-time processing capability, making it more efficient in managing data analytics.

Read full review

Bharghava Raghavendra Beesa

Senior Developer at Infosys

Jan 21, 2025

Spark is used for transformations from large volumes of data, and it is usefully distributed.

Read full review

SS

Sr Manager at a transportation company with 10,001+ employees

Dec 6, 2023

We use it for ETL purposes as well as for implementing the full transformation pipelines.

Read full review

AM

Aleksandr Motuzov

Head of Data Science center of excellence at Ameriabank CJSC

Sep 23, 2024

The most significant advantage of Spark 3.0 is its support for DataFrame UDF Pandas UDF features.

Read full review

Show 10 more reviews (out of 69)

Apache Spark Cons review quotes

Devindra Weerasooriya

Data Architect at Devtech

Nov 20, 2025

Very often in many of my experiments, the data set has had to be partitioned, and there have been issues in handling very large data sets, with most of my work done using Python machine learning libraries, requiring chunking, and speed of prediction has been an issue of concern in some experiments where we have had to shut down processes due to CPU requirements, then restart with different Apache configurations, and resourcing support is a major determinant if I were to name a constraint in terms of running machine learning experiments.

Read full review

ML

Michael Lierheimer

Consultant, Chief Engineer, Teamleiter at infoteam Software AG

Feb 27, 2026

I do not know exactly what was the reason to move away from Apache Spark or the underlying database system, but it was simply a decision driven by the customer.

Read full review

Data Engineer at a tech company with 10,001+ employees

Aug 12, 2025

The basic improvement would be to have integration with these solutions.

Read full review

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

900,644 professionals have used our research since 2012.

Dunstan Matekenya

Data Scientist at a financial services firm with 10,001+ employees

Jul 10, 2024

Apache Spark lacks geospatial data.

Read full review

Head of Data at a energy/utilities company with 51-200 employees

Aug 5, 2024

From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable.

Read full review

reviewer2534727

Manager Data Analytics at a outsourcing company with 5,001-10,000 employees

Aug 12, 2024

For improvement, I think the tool could make things easier for people who aren't very technical. There's a significant learning curve, and I've seen organizations give up because of it. Making it quicker or easier for non-technical people would be beneficial.

Read full review

Senior Software Architect at USEReady

Apr 24, 2025

There is complexity when it comes to understanding the whole ecosystem, especially for beginners.

Read full review

Bharghava Raghavendra Beesa

Senior Developer at Infosys

Jan 21, 2025

The Spark solution could improve in scheduling tasks and managing dependencies.

Read full review

SS

Sr Manager at a transportation company with 10,001+ employees

Dec 6, 2023

Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use.

Read full review

AM

Aleksandr Motuzov

Head of Data Science center of excellence at Ameriabank CJSC

Sep 23, 2024

The main concern is the overhead of Java when distributed processing is not necessary.

Read full review

Show 10 more reviews (out of 69)

Product Categories

Product Categories

Compute Service

Java Frameworks

Popular Comparisons

Popular Comparisons

Spring Boot vs Apache Spark

Spot by Flexera vs Apache Spark

AWS Lambda vs Apache Spark

Cloudera Distribution for Hadoop vs Apache Spark

IBM Netezza Performance Server vs Apache Spark

Amazon EC2 vs Apache Spark

Amazon EMR vs Apache Spark

AWS Fargate vs Apache Spark

IBM Spectrum Computing vs Apache Spark

Apache NiFi vs Apache Spark

Jakarta EE vs Apache Spark

HPE Data Fabric vs Apache Spark

AWS Batch vs Apache Spark

Amazon EC2 Auto Scaling vs Apache Spark

Helidon vs Apache Spark

See all alternatives

Product Categories

Product Categories

Compute Service

Java Frameworks

Popular Comparisons

Popular Comparisons

Spring Boot vs Apache Spark

Spot by Flexera vs Apache Spark

AWS Lambda vs Apache Spark

Cloudera Distribution for Hadoop vs Apache Spark

IBM Netezza Performance Server vs Apache Spark

Amazon EC2 vs Apache Spark

Amazon EMR vs Apache Spark

AWS Fargate vs Apache Spark

IBM Spectrum Computing vs Apache Spark

Apache NiFi vs Apache Spark

Jakarta EE vs Apache Spark

HPE Data Fabric vs Apache Spark

AWS Batch vs Apache Spark

Amazon EC2 Auto Scaling vs Apache Spark

Helidon vs Apache Spark

See all alternatives

Related questions

129

Which is the best RDMBS solution for big data?

89

Apache Spark without Hadoop -- Is this recommended?

116

Which solution has better performance: Spring Boot or Apache Spark?

85

AWS EMR vs Hadoop

83

Handling real and fast data - how do BigInsight and other solutions perform?

83

When evaluating Hadoop, what aspect do you think is the most important to look for?

83

Should we choose InfoSphere BigInsights or Cloudera?