Name: Apache Spark
Brand: Apache
Rating: 4.2 (69 reviews)

reviewer1046250

Senior Consultant & Training at a tech services company with 51-200 employees

Oct 14, 2019

Download

Easy to use and is capable of processing large amounts of data

Pros and Cons

"The most valuable feature of this solution is its capacity for processing large amounts of data."
"The most valuable feature of this solution is its capacity for processing large amounts of data."

"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."

What is our primary use case?

We use this solution for information gathering and processing.

I use it myself when I am developing on my laptop.

I am currently using an on-premises deployment model. However, in a few weeks, I will be using the EMR version on the cloud.

What is most valuable?

The most valuable feature of this solution is its capacity for processing large amounts of data.

This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.

What needs improvement?

When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data. Once you are experienced, it is easier and more stable.

When you are trying to do something outside of the normal requirements in a typical project, it is difficult to find somebody with experience.

For how long have I used the solution?

I have been using this solution for between two and three years.

Buyer's Guide

Apache Spark

May 2026

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.

DOWNLOAD NOW

893,244 professionals have used our research since 2012.

What do I think about the stability of the solution?

This solution is difficult for users who are just beginning and they experience out of memory errors when dealing with large amounts of data.

How are customer service and support?

I have not been in contact with technical support. I find all of the answers that I need in the forums.

What other advice do I have?

The work that we are doing with this solution is quite common and is very easy to do.

My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things.

I would rate this solution a nine out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

it_user946074

Principal Architect at a financial services firm with 1,001-5,000 employees

Jul 17, 2019

Download

Fast performance and has an easy initial setup

Pros and Cons

"I found the solution stable. We haven't had any problems with it."
"The fast performance is the most valuable aspect of the solution."

"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"It needs a new interface and a better way to get some data."

What is our primary use case?

We use the solution for analytics.

How has it helped my organization?

I'm not sure how it has improved my organization but I believe that it's a good product.

What is most valuable?

The fast performance is the most valuable aspect of the solution.

What needs improvement?

The search could be improved. Usually, we are using other tools to search for specific stuff. We'll be using it how I use other tools - to get the details, but if there any way to search for little things that will be better.

It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster.

In the next release, if they can add more analytics, that would be useful. For example, for data, built data, if there was one port where you put the high one then you can pull any other close to you, and then maybe a log for the right script.

For how long have I used the solution?

I've been using the solution for two years.

What do I think about the stability of the solution?

I found the solution stable. We haven't had any problems with it.

How are customer service and technical support?

Usually, we can fix any issues. If we have problems, we google a little bit to find the issue.

Which solution did I use previously and why did I switch?

I was using some other systems and we moved to Spark later. We faced performance and other issues with the other solution.

How was the initial setup?

The initial setup was easy. We keep on getting data from different sources so we will keep on porting in little bits. It's not done in a single sitting, so I can't really say how long it takes.

What other advice do I have?

I would recommend the solution. I would rate it an eight or nine out of 10.

For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go ahead. It doesn't work for me as I have different clients and different engagements.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide

Apache Spark

May 2026

Free Report: Apache Spark Reviews and More

Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: May 2026.

DOWNLOAD NOW

893,244 professionals have used our research since 2012.

Snrsecengin567

Snr Security Engineer at a tech vendor with 201-500 employees

Jul 17, 2019

Download

Provides security analytics and has good scalability

Pros and Cons

"The scalability has been the most valuable aspect of the solution."
"The 2.3 version is quite stable, all of our customers use it, there are around 100,000+ users, and it runs 24/7."

"The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive."
"The management tools could use improvement. Some of the debugging tools need some work as well."

What is our primary use case?

We primarily use the solution for security analytics.

What is most valuable?

The scalability has been the most valuable aspect of the solution.

What needs improvement?

The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive.

For how long have I used the solution?

I've been using the solution for three years.

What do I think about the stability of the solution?

The 2.3 version is quite stable. All of our customers use it, there are around 100,000+ users, and it runs 24/7.

What do I think about the scalability of the solution?

The scalability is very good.

How are customer service and technical support?

You actually buy Cloudera along with it. You don't really get any support, except you need support.

Which solution did I use previously and why did I switch?

In previous companies, we used MySQL platform and solutions like ArcSight and Splunk. We switched for scalability. MySQL wasn't going to scale, and we don't use Splunk at this company.

How was the initial setup?

The initial setup was complex. It is a complex tool. It's a lot to do with how you will use it. There is a lot to set up. They need to put a lot of scripts to it. There's nearly 60 to set up. When you set up the cloud, it takes about a day to set up. If you set it up on-premise, you know, on hardware, it only takes about a week.

What other advice do I have?

I would rate this solution eight out of 10.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

it_user1059558

Portfolio Manager, Enterprise Solutions Architect at Capgemini

Apr 11, 2019

Download

Supports streaming and micro-batch

Pros and Cons

"It is a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL."

"Better data lineage support."

What is our primary use case?

Streaming telematics data.

How has it helped my organization?

It's a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL.

What is most valuable?

It supports streaming and micro-batch.

What needs improvement?

Better data lineage support.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Sumanth Punyamurthula

Director - Data Management, Governance and Quality at Hilton Worldwide

Mar 19, 2019

Download

Powerful language but complicated coding

Pros and Cons

"Powerful language."

"It is like going back to the '80s for the complicated coding that is required to write efficient programs."

What is our primary use case?

Ingesting billions of rows of data all day.

How has it helped my organization?

Spark on AWS is not that cost-effective as memory is expensive and you cannot customize hardware in AWS. If you want more memory, you have to pay for more CPUs too in AWS.

What is most valuable?

Powerful language.

What needs improvement?

It is like going back to the '80s for the complicated coding that is required to write efficient programs.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer894894

Solutions Architect at a computer software company with 51-200 employees

Jul 11, 2018

Download

Features include machine learning, real time streaming, and data processing. It doesn't enable spark job scheduling with monitoring capability.

Pros and Cons

"Features include machine learning, real time streaming, and data processing."
"The fault tolerant feature is provided."
"It provides a scalable machine learning library."
"Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature."

"It should support more programming languages."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"It should support more programming languages."

What is our primary use case?

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.

How has it helped my organization?

It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes.

What is most valuable?

Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.

What needs improvement?

I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability.

For how long have I used the solution?

Trial/evaluations only.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

it_user786777

Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees

Dec 10, 2017

Download

We can now harness richer data sets and benefit from use cases

Pros and Cons

"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions."

"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a big sense of worry."

How has it helped my organization?

Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions.

What is most valuable?

Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.

What needs improvement?

Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a big sense of worry.

What do I think about the scalability of the solution?

No issues.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

it_user746943

Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees

Oct 2, 2017

Download

Provides flexibility for application creation with less coding effort

Pros and Cons

"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."