No more typing reviews! Try our Samantha, our new voice AI agent.
Apache Spark Logo

Apache Spark pros and cons

Vendor: Apache
4.2 out of 5
Badge Ranked 1

Pros & Cons summary

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.
Get the report

Prominent pros & cons

PROS

Apache Spark offers exceptional speed and efficiency in data processing, significantly outperforming traditional tools by effectively managing parallel processing and handling large data volumes.
It boasts strong scalability, enabling efficient handling of extensive datasets by distributing workloads across multiple nodes, which enhances performance and flexibility.
The in-memory processing engine is highly valuable, allowing for rapid data handling and processing by utilizing RAM rather than disk storage, which leads to enhanced execution speed.
Apache Spark supports a wide range of machine learning processes with a scalable library, providing valuable functionalities for real-time data processing, streaming, and analytical tasks.
The flexibility and user-friendly nature of Apache Spark make it easy to deploy and integrate seamlessly with existing processes, supported by clear documentation, which increases its adoption in various industries.

CONS

Apache Spark requires significant technical expertise to deploy and run high-tech tools, making it challenging for users without a technical background.
Apache Spark lacks support for certain machine learning libraries, models, and neural network-related algorithms, limiting its use in some applications.
Apache Spark's initial setup and installation are complex, demanding a considerable learning curve for practitioners.
Optimization techniques in Apache Spark have limitations, particularly when handling large data sets, affecting performance and efficiency.
Integration with popular databases and third-party platforms needs improvement, as current support often requires workarounds.
 

Apache Spark Pros review quotes

Devindra Weerasooriya - PeerSpot reviewer
Data Architect at Devtech
Nov 20, 2025
Apache Spark, specifically PySpark and the tools available there, have been quite helpful in my event analysis work.
ML
Consultant, Chief Engineer, Teamleiter at infoteam Software AG
Feb 27, 2026
The best features in Apache Spark that I appreciate are the fast database access, the data transformation, and the data exchange.
Omar Khaled - PeerSpot reviewer
Data Engineer at a tech company with 10,001+ employees
Aug 12, 2025
Apache Spark resolves many problems in the MapReduce solution and Hadoop, such as the inability to run effective Python or machine learning algorithms.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.
893,164 professionals have used our research since 2012.
Dunstan Matekenya - PeerSpot reviewer
Data Scientist at a financial services firm with 10,001+ employees
Jul 10, 2024
Apache Spark is known for its ease of use. Compared to other available data processing frameworks, it is user-friendly.
Madhan Potluri - PeerSpot reviewer
Head of Data at a energy/utilities company with 51-200 employees
Aug 5, 2024
The product's initial setup phase was easy.
KamleshPant - PeerSpot reviewer
Senior Software Architect at USEReady
Apr 24, 2025
Apache Spark's ability to handle both batch and streaming data is the most valuable feature for me as it offers solid real-time processing capability, making it more efficient in managing data analytics.
reviewer2534727 - PeerSpot reviewer
Manager Data Analytics at a outsourcing company with 5,001-10,000 employees
Aug 12, 2024
I like Apache Spark's flexibility the most. Before, we had one server that would choke up. With the solution, we can easily add more nodes when needed. The machine learning models are also really helpful. We use them to predict energy theft and find infrastructure problems.
Bharghava Raghavendra Beesa - PeerSpot reviewer
Senior Developer at Infosys
Jan 21, 2025
Spark is used for transformations from large volumes of data, and it is usefully distributed.
SS
Sr Manager at a transportation company with 10,001+ employees
Dec 6, 2023
We use it for ETL purposes as well as for implementing the full transformation pipelines.
AM
Head of Data Science center of excellence at Ameriabank CJSC
Sep 23, 2024
The most significant advantage of Spark 3.0 is its support for DataFrame UDF Pandas UDF features.
 

Apache Spark Cons review quotes

Devindra Weerasooriya - PeerSpot reviewer
Data Architect at Devtech
Nov 20, 2025
Very often in many of my experiments, the data set has had to be partitioned, and there have been issues in handling very large data sets, with most of my work done using Python machine learning libraries, requiring chunking, and speed of prediction has been an issue of concern in some experiments where we have had to shut down processes due to CPU requirements, then restart with different Apache configurations, and resourcing support is a major determinant if I were to name a constraint in terms of running machine learning experiments.
ML
Consultant, Chief Engineer, Teamleiter at infoteam Software AG
Feb 27, 2026
I do not know exactly what was the reason to move away from Apache Spark or the underlying database system, but it was simply a decision driven by the customer.
Omar Khaled - PeerSpot reviewer
Data Engineer at a tech company with 10,001+ employees
Aug 12, 2025
The basic improvement would be to have integration with these solutions.
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.
893,164 professionals have used our research since 2012.
Dunstan Matekenya - PeerSpot reviewer
Data Scientist at a financial services firm with 10,001+ employees
Jul 10, 2024
Apache Spark lacks geospatial data.
Madhan Potluri - PeerSpot reviewer
Head of Data at a energy/utilities company with 51-200 employees
Aug 5, 2024
From my perspective, the only thing that needs improvement is the interface, as it was not easily understandable.
KamleshPant - PeerSpot reviewer
Senior Software Architect at USEReady
Apr 24, 2025
There is complexity when it comes to understanding the whole ecosystem, especially for beginners.
reviewer2534727 - PeerSpot reviewer
Manager Data Analytics at a outsourcing company with 5,001-10,000 employees
Aug 12, 2024
For improvement, I think the tool could make things easier for people who aren't very technical. There's a significant learning curve, and I've seen organizations give up because of it. Making it quicker or easier for non-technical people would be beneficial.
Bharghava Raghavendra Beesa - PeerSpot reviewer
Senior Developer at Infosys
Jan 21, 2025
The Spark solution could improve in scheduling tasks and managing dependencies.
SS
Sr Manager at a transportation company with 10,001+ employees
Dec 6, 2023
Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use.
AM
Head of Data Science center of excellence at Ameriabank CJSC
Sep 23, 2024
The main concern is the overhead of Java when distributed processing is not necessary.