We use Spark Streaming in a micro-batch region. It's not a full real-time system, but it offers high performance and low latency.
Head of Data Science center of excellence at Ameriabank CJSC
Boosts performance with micro-batch streaming and detailed documentation
Pros and Cons
- "Spark Streaming is critical, quite stable, full-featured, and scalable."
- "We don't have enough experience to be judgmental about its flaws."
What is our primary use case?
What is most valuable?
Spark Streaming is critical, quite stable, full-featured, and scalable. It has a low latency and high performance, comparable to functions that can be called by triggers. It is well-designed with good documentation, making it easy to find solutions.
What needs improvement?
We don't have enough experience to be judgmental about its flaws, as we've only used stable features like batch micro-batch. Integration poses no problem; however, I don't use some features and can't judge those.
For how long have I used the solution?
I have used it intensively for maybe one year.
Buyer's Guide
Apache Spark Streaming
September 2025

Learn what your peers think about Apache Spark Streaming. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.
868,759 professionals have used our research since 2012.
What do I think about the stability of the solution?
It's quite stable and reliable for our use cases.
What do I think about the scalability of the solution?
Regardless of my case, Spark is scalable enough.
What about the implementation team?
We use our pipelines to deliver, which is a mix of involvement from a few team members.
What other advice do I have?
The solution rates a nine out of ten.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Easy deployment as a cluster and good documentation
Pros and Cons
- "Apache Spark Streaming was straightforward in terms of maintenance. It was actively developed, and migrating from an older to a newer version was quite simple."
- "It was resource-intensive, even for small-scale applications."
What is our primary use case?
We used Spark and Spark Streaming, as well as Spark ML, for multiple use cases, particularly streaming IoT-related data. Additionally, we applied Spark ML for various machine learning algorithms on the streaming data, mainly in the healthcare space. So, primarily in the healthcare domain.
What is most valuable?
With Spark Streaming, there was native Python support, which was beneficial for us. It was easy to deploy as a cluster, and the website was user-friendly. The documentation was also pretty good, and there was strong community support. Overall, it was considered an industry standard at the time.
What needs improvement?
In terms of disadvantages, it was a bit cumbersome due to its size. It wasn't quite cloud-native back then, meaning it wasn't easy to deploy it in a Kubernetes cluster and similar environments. I found it a bit challenging, but I'm not sure if that's still the case now. It probably has better support.
It was on-prem when we wanted to migrate it to the cloud, especially on Kubernetes, I remember facing some difficulties in successfully migrating the system.
For how long have I used the solution?
I explored it as part of a pilot project some time ago. We were using Spark Streaming, and I explored Pulse as a replacement for Spark Streaming for that use case. Overall, I've used Spark Streaming for around five years or so.
What do I think about the stability of the solution?
It is a stable solution.
What do I think about the scalability of the solution?
Scalability is pretty good. However, I must mention that I haven't tested it extensively with large-scale production scenarios. The testing I conducted was more of a pilot nature, and the scale was not very high. But based on what I've read, scalability shouldn't be an issue.
In the pilot project, there were around a thousand users. I didn't encounter any issues while scaling to that level.
How are customer service and support?
I mainly relied on the documentation and community support. There was sufficient support available for me during various times. I didn't actually contact Apache for any support-related activities.
How would you rate customer service and support?
Positive
Which solution did I use previously and why did I switch?
Spark Streaming was more widely used and had better documentation. It had frequent releases and active development compared to Storm, which had limited language support and stopped active development at some point. Spark Streaming also had top-level consultant support, which was beneficial for the team I was working with. That's why I made a switch.
How was the initial setup?
It was easy to install. I didn't find any difficulties while installing and trying it out, at least on a smaller scale.
Apache Spark Streaming was straightforward in terms of maintenance. It was actively developed, and migrating from an older to a newer version was quite simple. That was the main aspect of maintenance, and overall, it was a straightforward process. The documentation was good, and there was good community support. So I didn't face any problems while deploying and maintaining the solution.
What's my experience with pricing, setup cost, and licensing?
I was using the open-source community version, which was self-hosted. I'm not familiar with the pricing of the commercial version.
Which other solutions did I evaluate?
I had previously used Apache Storm, which is an open-source solution. I later switched to Spark Streaming and also tried Pulsar for similar use cases in the healthcare domain.
What other advice do I have?
I would highly recommend Spark Streaming for standard streaming or IoT use cases. The entire Spark ecosystem, including Spark Core, streaming, ML, and other components, can be highly beneficial. It's better to stick with the Spark ecosystem rather than use other platforms and frameworks. For streaming and IoT, Spark Streaming is a great choice.
Overall, I would rate the solution an eight out of ten. The only issue I found, at least during the time I actively worked with it, was that it was resource-intensive, even for small-scale applications. In comparison, some other platforms, like Pulsar, had lighter resource consumption and performed better in terms of resource usage and associated costs. At least, to begin with, it performs better with the resource usage and dollar value associated with it. But at least to begin with it is a bit heavy and resource intensive, which is why I rate it an eight.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Apache Spark Streaming
September 2025

Learn what your peers think about Apache Spark Streaming. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.
868,759 professionals have used our research since 2012.
Data Engineer at a comms service provider with 201-500 employees
A robust solution that is configurable based on one's requirements with features like checkpointing and API
Pros and Cons
- "Apache Spark Streaming has features like checkpointing and Streaming API that are useful."
- "The cost and load-related optimizations are areas where the tool lacks and needs improvement."
What is our primary use case?
The solution has industry-related use cases, with orders flowing from the order management system. We use Apache Spark Streaming to collect and store these orders in our database.
How has it helped my organization?
Before the introduction of Apache Spark Streaming, we primarily relied on cloud-related tools. Apache Spark Streaming is a robust solution configurable based on our requirements. The batches we deal with are very use-case-specific, and we tune in those batches accordingly.
What is most valuable?
Apache Spark Streaming has features like checkpointing and Streaming API that are useful.
What needs improvement?
Apache Spark Streaming is a native integration of some libraries in terms of cost and load-related optimizations. The cost and load-related optimizations are areas where the tool lacks and needs improvement.
For how long have I used the solution?
I have been using Apache Spark Streaming for a year. We use Apache Spark Streaming 3 in our company.
What do I think about the stability of the solution?
The solution is stable.
I rate the solution's stability an eight out of ten.
What do I think about the scalability of the solution?
The solution's scalability is very good. We have more than ten teams with around 100 consumers.
We have some alternative solutions.
I rate the solution's scalability a nine out of ten.
How are customer service and support?
We have not been in touch with Apache's support team. For support, we use the information available to the public.
Which solution did I use previously and why did I switch?
We have used Apache NiFi before in my company.
How was the initial setup?
I rate the initial setup a five on a scale from one to ten, where one is difficult, and ten is easy.
Apache Spark Streaming's deployment usually takes two to three minutes.
The solution deployment is a fully automated process, and we have a CI/CD process in place, so we trigger Jenkins Pipeline for the deployment.
The solution is deployed on a hybrid cloud.
The deployment can be done with just one click, so not even a person is needed for deployment.
What's my experience with pricing, setup cost, and licensing?
On a scale from one to ten, where one is expensive, or not cost-effective, and ten is cheap, I rate the price a seven.
What other advice do I have?
Apache Spark Streaming has very specific use cases and needs to be evaluated based on the needs of an individual before choosing it.
Overall, I rate the solution an eight out of ten.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Sr Technical Analyst at Sumtotal
Very fast with low latency data on data transformations
Pros and Cons
- "It's the fastest solution on the market with low latency data on data transformations."
- "The initial setup is quite complex."
What is our primary use case?
The primary use case of this solution is for streaming data. It can stream large amounts of data in small data chunks which are used for Databricks data. I've been using the solution for personal research purposes only and not for business applications. I'm a customer of Apache.
What is most valuable?
Data streaming would be the best feature of Spark and that includes when it's compared to Hadoop or Hive or Cassandra. It's the fastest solution on the market with low latency data on data transformations. I like that it's open source and easy to integrate with other data sources.
What needs improvement?
The initial setup is quite complex.
For how long have I used the solution?
I've been using this solution for six months.
What do I think about the stability of the solution?
The solution is stable.
What do I think about the scalability of the solution?
The solution is easily scalable in the cloud as per the limitations of the subscription.
Which solution did I use previously and why did I switch?
I have previously used a variety of different streaming platforms. I have written a paper analyzing various solutions for efficient streaming for cluster analysis and have published it. I found that Spark has the most features and is the quickest solution compared to the others when it comes to the transformation of data without any latencies or issues.
How was the initial setup?
With a few commands it's possible to install. I installed it in a Linux environment. That said, the initial setup is complex because we have to learn either Java or Scala language. Spark Streaming has a few features in GitHub and its libraries, so we need to get some code to maintain some methods or functions to integrate with any data sources, and then we'll try to run those integrations. It may be that only a high level programmer familiar with Scala and Java can implement. There are quite extensive pre-requirements for using it properly.
What's my experience with pricing, setup cost, and licensing?
I'm using the open-source version of Spark, so there are no licensing costs.
What other advice do I have?
It's important to be familiar with Spark Streaming and Spark libraries, because familiarity with those scripts and coding languages makes it easier to work with the Spark code ecosystem to get the integrations of Spark Streaming or any Spark cluster creations.
I rate this solution eight out of 10.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Head of Data Science at a energy/utilities company with 10,001+ employees
Open-source, reliable, and enterprise-ready
Pros and Cons
- "As an open-source solution, using it is basically free."
- "We would like to have the ability to do arbitrary stateful functions in Python."
What is our primary use case?
We're primarily using the solution for anomaly detection.
What is most valuable?
I like that it's Python. We have a Python ecosystem. Therefore, it fits perfectly.
The initial setup is simple.
The solution can scale.
It's a stable product.
As an open-source solution, using it is basically free.
What needs improvement?
We would like to have the ability to do arbitrary stateful functions in Python.
For how long have I used the solution?
We started using the solution half a year ago.
What do I think about the stability of the solution?
The solution is stable. There are no bugs or glitches. It doesn't crash or freeze.
What do I think about the scalability of the solution?
It's enterprise-ready. It's very scalable.
As we're using it mostly for data science types of activities, there are maybe eight active users.
How are customer service and support?
While we're purchasing external consulting to support us, the documentation is pretty good.
Which solution did I use previously and why did I switch?
We tried Flink, however, it was not satisfactory.
How was the initial setup?
It's not a complex implementation. In our case, it was easy as we have a hosted environment. The deployment only takes a couple of minutes.
What's my experience with pricing, setup cost, and licensing?
The solution is open-source. That's pretty reasonable. It's basically free.
What other advice do I have?
We are a customer and end-user.
We're using it in Azure, in Databricks. I don't know the exact version of Spark I'm using; it's one of the recent ones.
I would rate the product an eight out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Chief Data-strategist and Director at Theworkshop.es
Mature and stable with good scalability
Pros and Cons
- "The solution is very stable and reliable."
- "The solution itself could be easier to use."
What is most valuable?
The solution is very stable and reliable. It's quite mature.
The solution scales very well.
What needs improvement?
The installation is difficult. You definitely need more than one person. That said, if you are implementing the cloud, it's easier.
The solution itself could be easier to use.
The solution is free to use as it is open-source.
For how long have I used the solution?
I've used the solution for a while. I use it every day. However, it depends on the project.
What do I think about the stability of the solution?
The solution is stable. It's not a new tool. It's quite mature. It's been on the market for many years. We found that especially version 3.0.1 is a good, stable version.
What do I think about the scalability of the solution?
The solution is quite scalable. It is the most scalable tool that I have seen before.
We have five people using the solution.
Which solution did I use previously and why did I switch?
I did not previously use a different solution. I've been working in data warehousing for around 20 years ago and I used a batch system that used Oracle Database and is not a scalable system like Spark.
How was the initial setup?
The initial setup is quite involved. Streaming is a huge system. It's different. You need to use another part of the code, however, it's not extremely much. People who work in data need to micro-batch. You need other tools, such as Hadoop or Data Lake, or Kafka to control the data.
It's not easy to install. Not all products are open-source. It is not easy to implement on-premise as you need maybe two technical persons to maintain the system. If you put it in the cloud, it's easier.
What's my experience with pricing, setup cost, and licensing?
It's less expensive to use the cloud. Using on-premises is more costly. Spark is open-source and doesn't actually cost us anything.
What other advice do I have?
It's cheaper for companies to use cloud systems, however, you can implement it on-premise.
We use the cloud. As it is the cloud, it's always on the latest version and updates itself regularly.
I would rate the product at a nine out of ten. It's very good in terms of its capabilities and I have been very happy with it.
I would recommend the solution to other users.
Which deployment model are you using for this solution?
Public Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
DevOps engineer at Vvolve management consultants
Handles large datasets and is relatively easy to manage, especially with cloud technologies but scalability features could be enhanced
Pros and Cons
- "Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way."
- "The debugging aspect could use some improvement."
What is our primary use case?
I've used it more for ETL. It's useful for creating data pipelines, streaming datasets, generating synthetic data, synchronizing data, creating data lakes, and loading and unloading data is fast and easy.
In my ETL work, I often move data from multiple sources into a data lake. Apache Spark is very helpful for tracking the latest data delivery and automatically streaming it to the target database.
How has it helped my organization?
Apache Spark is a versatile technology useful not only for data solutions but also for data creation. This is especially valuable given GDPR regulations and limited access to production, which make tasks like testing quite difficult. It helps with data creation and alignment for both consumers and developers.
What is most valuable?
Apache Spark Streaming is particularly good at handling real-time data. It has built-in data streaming integration, which allows it to stream data from any source as soon as it becomes available.
What needs improvement?
The scalability features are already good, but they could be further enhanced. Additionally, the debugging aspect could use some improvement.
What do I think about the stability of the solution?
The stability is very good. Since everything runs as code, it's easy to understand what's happening under the hood. It's not a closed-box system, which makes it quite transparent.
What do I think about the scalability of the solution?
On my team, there are about six or seven people using it. However, on the analytics side, where users view the reports, there are many more, perhaps over a hundred.
How was the initial setup?
The deployment process is quite easy and not very complicated.
Since it's an open-source technology, it can be deployed in various environments, including local machines and all kinds of clouds. If you're using the cloud, scaling is quite easy.
What about the implementation team?
If there are knowledgeable, experienced team members, it doesn't require a large team. One or two developers are enough.
What was our ROI?
It can handle large datasets and is relatively easy to manage, especially with cloud technologies. This means you can process a lot of data even with a low-configuration environment, which helps with cost savings.
What other advice do I have?
I would rate it a seven out of ten. Apache Spark's capabilities for machine learning are quite extensive and can be used in a low-code way. This can be much more efficient than using various technologies. You can also combine its batch processing capabilities with new technologies and machine learning.
It's quite useful for AI because of its machine-learning capabilities, which allow for model training and output generation.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Chief Innovation & Technology Leader at a mining and metals company with 1,001-5,000 employees
Efficient, better then average, but overly developer-focused
Pros and Cons
- "The solution is better than average and some of the valuable features include efficiency and stability."
- "There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused."
What is our primary use case?
The primary use of the solution is to implement predictive maintenance qualities.
What is most valuable?
The solution is better than average and some of the valuable features include efficiency and stability.
What needs improvement?
There could be an improvement in the area of the user configuration section, it should be less developer-focused and more business user-focused. For example, it is still not plug and play and use as some of the cloud offerings that come ready to use. It is not up there in the reading leading edge.
For how long have I used the solution?
I have been using this solution for approximately one and a half year.
What do I think about the stability of the solution?
The solution is very stable.
How was the initial setup?
The initial setup is developer-focused but it is not very complex. I can set up a stream in less than an hour. It will stream but It will not be a production-ready stream.
What other advice do I have?
I rate Apache Spark Streaming a six out of ten.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure
Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Buyer's Guide
Download our free Apache Spark Streaming Report and get advice and tips from experienced pros
sharing their opinions.
Updated: September 2025
Product Categories
Streaming AnalyticsPopular Comparisons
Databricks
Confluent
Apache Flink
Azure Stream Analytics
Spring Cloud Data Flow
Amazon Kinesis
Amazon MSK
Starburst Enterprise
Informatica Data Engineering Streaming
Apache Pulsar
Aiven Platform
Talend Data Streams
SAS Event Stream Processing
Buyer's Guide
Download our free Apache Spark Streaming Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- How do you select the right cloud ETL tool?
- What is the best streaming analytics tool?
- What are the benefits of streaming analytics tools?
- What features do you look for in a streaming analytics tool?
- When evaluating Streaming Analytics, what aspect do you think is the most important to look for?
- Why is Streaming Analytics important for companies?