

Pentaho Data Integration and StreamSets are leading solutions competing in the data integration and analytics field. Pentaho is highly regarded for its extensive database connectivity and big data support, while StreamSets is preferred for its simplicity in building pipelines and handling data drift.
Features: Pentaho Data Integration offers extensive database connectivity, a graphical interface for ease of use, and excellent support for data transformation and big data technologies. StreamSets stands out for its user-friendly interface, ease of building pipelines without extensive coding, and robust capabilities for handling data drift and change data capture.
Room for Improvement: Pentaho users highlight limitations in performance with big data, outdated scheduling tools, and a less user-friendly interface lacking enterprise support in the community version. StreamSets could improve its pipeline visualization, error logging, and offer more flexible pricing and better security features.
Ease of Deployment and Customer Service: Pentaho is easily deployable in on-premise environments but faces scalability challenges in the cloud, with users relying heavily on community forums for support. StreamSets excels with flexible deployment options across on-premise, cloud, and hybrid environments, and is praised for its professional support and problem-solving capabilities.
Pricing and ROI: Pentaho's open-source community edition provides notable cost savings, making it attractive to budget-conscious users, while its enterprise edition is more costly. StreamSets offers a free version but its paid model can be expensive for smaller businesses. Larger enterprises find it offers a good ROI due to improved data processing times and resource savings, despite higher costs.
I have seen a return on investment; my team was able to stay extremely small even though we had a lot of data integrations with many companies.
I can testify to the return on investment with metrics regarding time saved; we have increased our efficiency by about 20 to 30 percent due to the swift migration processes facilitated by the tool.
24/7 assistance is available for the Enterprise Edition.
take the time to understand our business requirements, offering appropriate recommendations.
Communication with the vendor is challenging
IBM technical support sometimes transfers tickets between different teams due to shift changes, which can be frustrating.
It can be scaled well until you reach a point where you need to perform a lot of operations, and the issue arises when it runs out of memory to handle some data.
Pentaho Data Integration handles larger datasets better.
Pentaho Data Integration and Analytics' scalability is commendable, as it allows us to scale up according to our needs.
Performance issues arise due to reliance on a flowchart-based mechanism instead of scripts, which can lead to longer execution times.
I find that version 3.1 is the most stable version I have ever used.
It's pretty stable, however, it struggles when dealing with smaller amounts of data.
We should also explore more effective partitioning for parallel processing and fine-tuning database connections to reduce load times and improve ETL speed.
Pentaho Data Integration and Analytics can be improved by working with different environments, specifically the possibility to change the variables, meaning I write my variables only once and can change them for different environments such as production or development.
I also lack the option to use programming languages beyond Python and SQL, and a provision to incorporate Scala code in the scripting component would be beneficial.
It would be beneficial if StreamSets addressed any potential memory leak issues to prevent unnecessary upgrades.
I use the community version of Pentaho Data Integration and Analytics, and I do not need additional costs.
The setup cost was minimal, and the pricing experience was pretty good.
Pentaho Data Integration and Analytics has positively impacted my organization because it meant we didn't have to write a lot of custom API back-end processing logic; it did the majority of that heavy lifting for us.
It automates the data workflow, including extraction, cleansing, and loading into warehouses for BI reporting purposes, while also removing duplicates, validating data, and standardizing formats, enabling real-time decision-making.
Pentaho Data Integration and Analytics has positively impacted my organization because it is easier to use, and my knowledge about this work facilitates the translation from the source to my final system.
It allows a hybrid installation approach, rather than being completely cloud-based or on-premises.
| Product | Market Share (%) |
|---|---|
| Pentaho Data Integration and Analytics | 1.5% |
| StreamSets | 1.2% |
| Other | 97.3% |

| Company Size | Count |
|---|---|
| Small Business | 18 |
| Midsize Enterprise | 18 |
| Large Enterprise | 29 |
| Company Size | Count |
|---|---|
| Small Business | 9 |
| Midsize Enterprise | 2 |
| Large Enterprise | 11 |
Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools for data mining, machine learning, and statistical analysis, Pentaho Data Integration empowers organizations to glean valuable insights from their data. What sets Pentaho Data Integration apart is its maturity and a vibrant community of users and developers, making it a reliable and cost-effective option. Pentaho Data Integration offers a range of features, including a comprehensive ETL toolkit, data cleaning and transformation capabilities, robust data analysis tools, and seamless deployment options for data integration and analytics solutions, making it a go-to solution for organizations seeking to harness the power of their data.
StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.
We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.