Apache Spark Streaming Reviews and Pricing

Aleksandr Motuzov

Head of Data Science center of excellence at Ameriabank CJSC

Nov 26, 2024

Download

Boosts performance with micro-batch streaming and detailed documentation

Pros and Cons

"Spark Streaming is critical, quite stable, full-featured, and scalable."

"We don't have enough experience to be judgmental about its flaws."

What is our primary use case?

We use Spark Streaming in a micro-batch region. It's not a full real-time system, but it offers high performance and low latency.

What is most valuable?

Spark Streaming is critical, quite stable, full-featured, and scalable. It has a low latency and high performance, comparable to functions that can be called by triggers. It is well-designed with good documentation, making it easy to find solutions.

What needs improvement?

We don't have enough experience to be judgmental about its flaws, as we've only used stable features like batch micro-batch. Integration poses no problem; however, I don't use some features and can't judge those.

For how long have I used the solution?

I have used it intensively for maybe one year.

Buyer's Guide

Apache Spark Streaming

June 2026

Free Report: Apache Spark Streaming Reviews and More

Learn what your peers think about Apache Spark Streaming. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

DOWNLOAD NOW

902,495 professionals have used our research since 2012.

What do I think about the stability of the solution?

It's quite stable and reliable for our use cases.

What do I think about the scalability of the solution?

Regardless of my case, Spark is scalable enough.

What about the implementation team?

We use our pipelines to deliver, which is a mix of involvement from a few team members.

What other advice do I have?

The solution rates a nine out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Srikanth Bhuvanagiri

Sr Technical Analyst at Sumtotal

Dec 12, 2022

Download

Very fast with low latency data on data transformations

Pros and Cons

"It's the fastest solution on the market with low latency data on data transformations."

"The initial setup is quite complex."

What is our primary use case?

The primary use case of this solution is for streaming data. It can stream large amounts of data in small data chunks which are used for Databricks data. I've been using the solution for personal research purposes only and not for business applications. I'm a customer of Apache.

What is most valuable?

Data streaming would be the best feature of Spark and that includes when it's compared to Hadoop or Hive or Cassandra. It's the fastest solution on the market with low latency data on data transformations. I like that it's open source and easy to integrate with other data sources.

What needs improvement?

The initial setup is quite complex.

For how long have I used the solution?

I've been using this solution for six months.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is easily scalable in the cloud as per the limitations of the subscription.

Which solution did I use previously and why did I switch?

I have previously used a variety of different streaming platforms. I have written a paper analyzing various solutions for efficient streaming for cluster analysis and have published it. I found that Spark has the most features and is the quickest solution compared to the others when it comes to the transformation of data without any latencies or issues.

How was the initial setup?

With a few commands it's possible to install. I installed it in a Linux environment. That said, the initial setup is complex because we have to learn either Java or Scala language. Spark Streaming has a few features in GitHub and its libraries, so we need to get some code to maintain some methods or functions to integrate with any data sources, and then we'll try to run those integrations. It may be that only a high level programmer familiar with Scala and Java can implement. There are quite extensive pre-requirements for using it properly.

What's my experience with pricing, setup cost, and licensing?

I'm using the open-source version of Spark, so there are no licensing costs.

What other advice do I have?

It's important to be familiar with Spark Streaming and Spark libraries, because familiarity with those scripts and coding languages makes it easier to work with the Spark code ecosystem to get the integrations of Spark Streaming or any Spark cluster creations.

I rate this solution eight out of 10.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide

Apache Spark Streaming

June 2026

Free Report: Apache Spark Streaming Reviews and More

Learn what your peers think about Apache Spark Streaming. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.

DOWNLOAD NOW

902,495 professionals have used our research since 2012.

reviewer1494531

Head of Data Science at a energy/utilities company with 10,001+ employees

Apr 11, 2022

Download

Open-source, reliable, and enterprise-ready

Pros and Cons

"As an open-source solution, using it is basically free."

"We would like to have the ability to do arbitrary stateful functions in Python."

What is our primary use case?

We're primarily using the solution for anomaly detection.

What is most valuable?

I like that it's Python. We have a Python ecosystem. Therefore, it fits perfectly.

The initial setup is simple.

The solution can scale.

It's a stable product.

As an open-source solution, using it is basically free.

What needs improvement?

We would like to have the ability to do arbitrary stateful functions in Python.

For how long have I used the solution?

We started using the solution half a year ago.

What do I think about the stability of the solution?

The solution is stable. There are no bugs or glitches. It doesn't crash or freeze.

What do I think about the scalability of the solution?

It's enterprise-ready. It's very scalable.

As we're using it mostly for data science types of activities, there are maybe eight active users.

How are customer service and support?

While we're purchasing external consulting to support us, the documentation is pretty good.

Which solution did I use previously and why did I switch?

We tried Flink, however, it was not satisfactory.

How was the initial setup?

It's not a complex implementation. In our case, it was easy as we have a hosted environment. The deployment only takes a couple of minutes.

What's my experience with pricing, setup cost, and licensing?

The solution is open-source. That's pretty reasonable. It's basically free.

What other advice do I have?

We are a customer and end-user.

We're using it in Azure, in Databricks. I don't know the exact version of Spark I'm using; it's one of the recent ones.

I would rate the product an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Oscar Estorach

Chief Data Strategist And Director at theworkshop.es

Aug 19, 2021

Download

Mature and stable with good scalability

Pros and Cons

"The solution is very stable and reliable."
"It is the most scalable tool that I have seen before."

"The solution itself could be easier to use."

What is most valuable?

The solution is very stable and reliable. It's quite mature.

The solution scales very well.

What needs improvement?

The installation is difficult. You definitely need more than one person. That said, if you are implementing the cloud, it's easier.

The solution itself could be easier to use.

The solution is free to use as it is open-source.

For how long have I used the solution?

I've used the solution for a while. I use it every day. However, it depends on the project.

What do I think about the stability of the solution?

The solution is stable. It's not a new tool. It's quite mature. It's been on the market for many years. We found that especially version 3.0.1 is a good, stable version.

What do I think about the scalability of the solution?

The solution is quite scalable. It is the most scalable tool that I have seen before.

We have five people using the solution.

Which solution did I use previously and why did I switch?

I did not previously use a different solution. I've been working in data warehousing for around 20 years ago and I used a batch system that used Oracle Database and is not a scalable system like Spark.

How was the initial setup?

The initial setup is quite involved. Streaming is a huge system. It's different. You need to use another part of the code, however, it's not extremely much. People who work in data need to micro-batch. You need other tools, such as Hadoop or Data Lake, or Kafka to control the data.

It's not easy to install. Not all products are open-source. It is not easy to implement on-premise as you need maybe two technical persons to maintain the system. If you put it in the cloud, it's easier.

What's my experience with pricing, setup cost, and licensing?

It's less expensive to use the cloud. Using on-premises is more costly. Spark is open-source and doesn't actually cost us anything.

What other advice do I have?

It's cheaper for companies to use cloud systems, however, you can implement it on-premise.

We use the cloud. As it is the cloud, it's always on the latest version and updates itself regularly.

I would rate the product at a nine out of ten. It's very good in terms of its capabilities and I have been very happy with it.

I would recommend the solution to other users.

Which deployment model are you using for this solution?

Public Cloud

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

RajeevKumar10

DevOps engineer at Vvolve management consultants

Jun 7, 2024

Download

Handles large datasets and is relatively easy to manage, especially with cloud technologies but scalability features could be enhanced

Pros and Cons

"Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way."

"The debugging aspect could use some improvement."

What is our primary use case?

I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy.

In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.

How has it helped my organization?

Apache Spark is a versatile technology useful not only for data solutions but also for data creation. This is especially valuable given GDPR regulations and limited access to production, which make tasks like testing quite difficult. It helps with data creation and alignment for both consumers and developers.

What is most valuable?

Apache Spark Streaming is particularly good at handling real-time data. It has built-in data streaming integration, which allows it to stream data from any source as soon as it becomes available.

What needs improvement?

The scalability features are already good, but they could be further enhanced. Additionally, the debugging aspect could use some improvement.

What do I think about the stability of the solution?

The stability is very good. Since everything runs as code, it's easy to understand what's happening under the hood. It's not a closed-box system, which makes it quite transparent.

What do I think about the scalability of the solution?

On my team, there are about six or seven people using it. However, on the analytics side, where users view the reports, there are many more, perhaps over a hundred.

How was the initial setup?

The deployment process is quite easy and not very complicated.

Since it's an open-source technology, it can be deployed in various environments, including local machines and all kinds of clouds. If you're using the cloud, scaling is quite easy.

What about the implementation team?

If there are knowledgeable, experienced team members, it doesn't require a large team. One or two developers are enough.

What was our ROI?

It can handle large datasets and is relatively easy to manage, especially with cloud technologies. This means you can process a lot of data even with a low-configuration environment, which helps with cost savings.

What other advice do I have?

I would rate it a seven out of ten. Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way. This can be much more efficient than using various technologies. You can also combine its batch processing capabilities with new technologies and machine learning.

It's quite useful for AI because of its machine-learning capabilities, which allow for model training and output generation.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1516182

Chief Innovation & Technology Leader at a mining and metals company with 1,001-5,000 employees

Mar 22, 2021

Download

Efficient, better then average, but overly developer-focused

Pros and Cons

"The solution is better than average and some of the valuable features include efficiency and stability."

"There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused."

What is our primary use case?

The primary use of the solution is to implement predictive maintenance qualities.

What is most valuable?

The solution is better than average and some of the valuable features include efficiency and stability.

What needs improvement?

There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused. For example, it is still not plug and play and use as some of the cloud offerings that come ready to use. It is not up there in the reading leading edge.

For how long have I used the solution?

I have been using this solution for approximately one and a half year.

What do I think about the stability of the solution?

The solution is very stable.

How was the initial setup?

The initial setup is developer-focused but it is not very complex. I can set up a stream in less than an hour. It will stream but It will not be a production-ready stream.

What other advice do I have?

I rate Apache Spark Streaming a six out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer2392494

Enterprise Data Architect at a pharma/biotech company with 11-50 employees

May 26, 2024

Download

Provides real-time data processing capabilities with efficient reliability

Pros and Cons

"The platform’s most valuable feature for processing real-time data is its ability to handle continuous data streams."

"Integrating event-level streaming capabilities could be beneficial."

What is most valuable?

The platform’s most valuable feature for processing real-time data is its ability to handle continuous data streams.

What needs improvement?

The product's event handling capabilities, particularly compared to Kaspersky, need improvement. Integrating event-level streaming capabilities could be beneficial. This aligns with the idea of expanding Spark's functionality to cover unaddressed areas, potentially enhancing its competitiveness.

For how long have I used the solution?

We have been using Apache Spark Streaming for five years.

What's my experience with pricing, setup cost, and licensing?

Spark is an affordable solution, especially considering its open-source nature. However, it could use support from experienced companies to resolve any issues effectively.

What other advice do I have?

Spark does not encounter integration issues, particularly due to its utilization of JDBC connectors. These connectors facilitate seamless integration with third-party solutions. Furthermore, successful integration with tools like SAP HANA indicates its versatility in handling various data sources. Additionally, its performance surpasses Informatica in certain scenarios, especially when real-time streaming capabilities are crucial. It remains a preferred choice for businesses requiring efficient real-time data processing.

I rate it an eight.

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner