No more typing reviews! Try our Samantha, our new voice AI agent.
StreamSets Logo

StreamSets pros and cons

Vendor: IBM
4.2 out of 5

Pros & Cons summary

Buyer's Guide

Get pricing advice, tips, use cases and valuable features from real users of this product.
Get the report

Prominent pros & cons

PROS

StreamSets significantly reduces the time required to fix data drift, completing tasks that previously took over an hour in just 15 minutes.
The data drift feature alerts users upfront about data ingestibility, automatically updating schema or data type changes for downstream processing.
Data Collector and Control Hub in StreamSets are user-friendly, making it accessible for those without a technical background to build data pipelines.
StreamSets offers robust integration options with a variety of protocols, languages, and origins, supporting various data media formats.
Its plugins and numerous ready connectors are particularly useful, providing ease of configuration and flexibility for managing data sources.

CONS

StreamSets needs broader integration capabilities beyond Java, particularly for .NET and other frameworks.
Real-time processing improvements are necessary, as current batch processing capabilities do not meet low latency requirements.
Issues like memory leaks and inefficient logging mechanisms complicate problem resolution.
Simplified multi-table reading from SAP HANA is lacking, necessitating manual configurations.
Documentation and support require enhancement, particularly for update guidance and advanced technical issues.
 

StreamSets Pros review quotes

SS
Enterprise Solutions Architect at a energy/utilities company with 1,001-5,000 employees
Apr 2, 2025
StreamSets is the leader in the market.
Saket Pandey - PeerSpot reviewer
Product Manager at a hospitality company with 51-200 employees
May 17, 2023
The ability to have a good bifurcation rate and fewer mistakes is valuable.
Ved Prakash Yadav - PeerSpot reviewer
Senior Data Platform Manager at a manufacturing company with 10,001+ employees
Apr 10, 2024
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up.
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: April 2026.
893,221 professionals have used our research since 2012.
Nantabo Jackie - PeerSpot reviewer
Sales Manager at Soft Hostings Limited
Mar 24, 2023
The most valuable features are the option of integration with a variety of protocols, languages, and origins.
Reyansh Kumar - PeerSpot reviewer
Technical Specialist at Accenture
Mar 10, 2023
The scheduling within the data engineering pipeline is very much appreciated, and it has a wide range of connectors for connecting to any data sources like SQL Server, AWS, Azure, etc. We have used it with Kafka, Hadoop, and Azure Data Factory Datasets. Connecting to these systems with StreamSets is very easy.
SS
Enterprise Solutions Architect at a energy/utilities company with 1,001-5,000 employees
Jun 9, 2022
StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved.
Ramesh Kuppuswamy - PeerSpot reviewer
Senior Software Developer at a tech vendor with 10,001+ employees
Jan 6, 2023
The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.
Karthik Rajamani - PeerSpot reviewer
Principal Engineer at Tata Consultancy Services
Jun 14, 2022
I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks.
AbhishekKatara - PeerSpot reviewer
Technical Lead at Sopra Steria
May 15, 2022
StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes.
MI
Software Engineer at Soft Hostings Limited
Sep 18, 2024
What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes.
 

StreamSets Cons review quotes

SS
Enterprise Solutions Architect at a energy/utilities company with 1,001-5,000 employees
Apr 2, 2025
One issue I observed with StreamSets is that the memory runs out quickly when processing large volumes of data. Because of this memory issue, we have to upgrade our EC2 boxes in the Amazon AWS infrastructure.
Saket Pandey - PeerSpot reviewer
Product Manager at a hospitality company with 51-200 employees
May 17, 2023
One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing.
Ved Prakash Yadav - PeerSpot reviewer
Senior Data Platform Manager at a manufacturing company with 10,001+ employees
Apr 10, 2024
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered.
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: April 2026.
893,221 professionals have used our research since 2012.
Nantabo Jackie - PeerSpot reviewer
Sales Manager at Soft Hostings Limited
Mar 24, 2023
The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base.
Reyansh Kumar - PeerSpot reviewer
Technical Specialist at Accenture
Mar 10, 2023
They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved.
SS
Enterprise Solutions Architect at a energy/utilities company with 1,001-5,000 employees
Jun 9, 2022
Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.
Ramesh Kuppuswamy - PeerSpot reviewer
Senior Software Developer at a tech vendor with 10,001+ employees
Jan 6, 2023
The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information.
Karthik Rajamani - PeerSpot reviewer
Principal Engineer at Tata Consultancy Services
Jun 14, 2022
We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.
AbhishekKatara - PeerSpot reviewer
Technical Lead at Sopra Steria
May 15, 2022
The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time.
MI
Software Engineer at Soft Hostings Limited
Sep 18, 2024
There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline.