We run the payment systems for Canada. We use it as a typical ETL tool to transfer and modify data into a data warehouse. We have many different pipelines that we have built with it.
Lead, Data and BI Architect at a financial services firm with 201-500 employees
We can use the same tool on all our environments. The patching is buggy.
Pros and Cons
- "Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us."
- "The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."
What is our primary use case?
How has it helped my organization?
I love the fact that we haven't come up with a problem yet that we haven't been able to address with this tool. I really appreciate its maturity and the breadth of its capabilities.
If we did not have this tool, we would probably have to use a whole different variety of tools, then our environment would be a lot more complicated.
We develop metadata pipelines and use them.
Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us.
What is most valuable?
Because it comes from an open-source background, it has so many different plugins. It is just extremely broad in what it can do. I appreciate that it has a very broad, wide spectrum of things that it can connect to and do. It has been around for a while, so it is mature and has a lot of things built into it. That is the biggest thing.
The visual nature of its development is a big plus. You don't need to have very strong developers to be able to work with it.
We often have to drop down to JavaScript, but that is fine. I appreciate that it has the capability built-in. When you need to, you can drop down to a scripting language. This is important to us.
What needs improvement?
The documentation is very basic.
The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi.
Buyer's Guide
Pentaho Data Integration and Analytics
January 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
881,082 professionals have used our research since 2012.
For how long have I used the solution?
Overall, I have been using it for about 10 years. At my current organization, I have been using it for about seven years. It was used a little bit at my previous organization as well.
What do I think about the stability of the solution?
The stability is not great, especially when you start patching it a lot because things get broken. That is not a great look. When you start patching, you are expecting things to get fixed, not new things to get broken.
With modern programming, you build a lot of automated testing around your solution, and it is specifically for that. I changed this piece of code. Well, what else got broken? Obviously they don't have a lot of unit tests built into their code. They need to start doing that because it looks horrible when they change one thing, then two other things get broken. Then, they released that as a commercial product, which is horrible. Last time, somehow they broke the ability to connect with databases. That is something incredibly basic. How could you release this product without even testing for that?
What do I think about the scalability of the solution?
We don't have a huge amount of data, so I can't really answer how we could scale up to very large solutions.
How are customer service and support?
Lumada’s ability to quickly and effectively solve issues we have brought up is not great. We have a service for the solution with Hitachi. I don't get the sense that Pentaho, and Hitachi still calls it Pentaho, is a huge center of focus for them.
You kind of get help, but the people from whom you get help aren't necessarily super strong. It often goes around in circles forever. I eventually have to find my own solution.
I haven't found that the Hitachi support site has a depth of understanding for the solution. They can answer simple questions, but when it gets more in-depth, they have a lot of trouble answering questions. I don't think the support people have the depth of expertise to really deal with difficult questions.
I would rate them as five out of 10. They are responsive and polite. I don't feel ignored or anything like that, just the depth of knowledge isn't there.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
It has always been here. There was no solution like it until I got to the company.
How was the initial setup?
The initial setup was complex because we had to integrate with SAML. Even though they had some direction on that, it was really a do-it-yourself kind of thing. That was pretty complicated, so if they want to keep this product fresh, I think they have to work on making it integrate more with modern technology, like single sign-on and stuff like that. Every organization has that now and Pentaho doesn't have a good story for that. However, it is the platform that they don't give a lot of love to.
It took us a long time to figure it out, something like two weeks.
What was our ROI?
This has reduced our ETL development time. If it wasn't for this solution, we would be doing custom coding. The reason why we are using the solution is because of its simplicity of development.
What's my experience with pricing, setup cost, and licensing?
The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that.
Which other solutions did I evaluate?
Apache has a project going on called Apache Hop. Because Pentaho was open sourced, people have taken and forged it. They are really modernizing the solution. As far as I know, Hitachi is not involved yet. I would highly advise them to get involved in that open-source project. It will be the next generation of Pentaho. If they get left behind, they're not going to have anything. It would be a very bad move to just ignore it. Hitachi should not ignore Apache Hop.
What other advice do I have?
I really like the data integration tool. However, it is part of a whole platform of tools, and it is obvious the other tools just don't get a lot of love. We are in it for Pentaho Data Integration (PDI) because that is what we want as our ETL tool. We use their reporting platform and stuff like that, but it is obvious that they just don't get a lot of love or concern.
I haven't looked at the roadmap that much. We are also a Google customer using BigQuery, etc. Hitachi is really just a very niche part of what we do. Therefore, we are not generally looking very seriously at what Hitachi is doing with their products nor a big investor in what Hitachi is doing.
I would recommend this specific Hitachi product to a friend or colleague, depending on their use case and need. If they have a very similar need, I would recommend it. I wouldn't be saying, "Oh, this is the best thing next to sliced bread," but say, "Hey, if this is what you need, this works well for us."
On a scale of one to 10 for recommending the product, I would rate it as seven out of 10. Overall, I would also rate it as seven out of 10.
We really appreciated the breadth of its capabilities. It is not the top-of-the-line solution, but you really get a lot for what you pay for.
Which deployment model are you using for this solution?
Hybrid Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Project Leader at a mining and metals company with 10,001+ employees
Fastens the data flow processes and has a user-friendly interface
Pros and Cons
- "It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
- "As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
What is our primary use case?
The company where I was working previously was using this product. We were using it for ETL process management. It was like a data flow automatization.
In terms of deployment, we were using an on-premise model because we had sensitive data, and there were some restrictions related to information security.
How has it helped my organization?
Our data flow processes became faster with this solution.
What is most valuable?
It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.
What needs improvement?
As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows.
The last time I saw this product, the onboarding instructions were not clear. If the process of onboarding this product is made more clear, it will take the product to the next level. There is a possibility that the onboarding process has already improved, and I haven't seen it.
For how long have I used the solution?
I have used this solution for two or three years.
What do I think about the stability of the solution?
I would rate it an eight out of ten in terms of stability.
What do I think about the scalability of the solution?
We didn't have to scale too much. So, I can't evaluate it properly in terms of scalability.
In terms of its users, only our team was using it. There were approximately 20 users. It was not for the whole company.
How are customer service and support?
We didn't use too much customer support. We were using the open-source resources through Google Search. So, we were just using text search. There were some helpful forums where we were able to find the answers to our questions.
Which solution did I use previously and why did I switch?
I didn't use any other solution previously. This was the only one.
How was the initial setup?
I wasn't a part of its deployment. In terms of maintenance, as far as I know, it didn't require much maintenance.
What was our ROI?
We absolutely saw an ROI. It was hard to calculate, but we felt it in terms of
the speed of our processes. After using this product, we could do some of the things much faster than before.
What's my experience with pricing, setup cost, and licensing?
I mostly used the open-source version. I didn't work with a license.
Which other solutions did I evaluate?
I did not evaluate other options.
What other advice do I have?
I would recommend using this product for data engineering and Extract, Transform, and Load (ETL) processes.
I would rate it an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Pentaho Data Integration and Analytics
January 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: January 2026.
881,082 professionals have used our research since 2012.
Senior Data Analyst at a tech services company with 51-200 employees
We're able to query large data sets without affecting performance
Pros and Cons
- "One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results."
- "Parallel execution could be better in Pentaho. It's very simple but I don't think it works well."
What is our primary use case?
I use it for ETL. We receive data from our clients and we join the most important information and do many segmentations to help with communication between our product and our clients.
How has it helped my organization?
Before we used Pentaho, our processes were in Microsoft Excel and the updates from databases had to be done manually. Now all our routines are done automatically and we have more time to do other jobs. It saves us four or five hours daily.
In terms of ETL development time, it depends on the complexity of the job, but if the job is simple it saves two or three hours.
What is most valuable?
One of the most valuable features is the ability to create many API integrations. I'm always working with advertising agents and using Facebook and Instagram to do campaigns. We use Pentaho to get the results from these campaigns and to create dashboards to analyze the results.
I'm working with large data sets. One of the clients I'm working with is a large credit card company and the database from this client is very large. Pentaho allows me to query large data sets without affecting its performance.
I use Pentaho with Jenkins to schedule the jobs. I'm using the jobs and transformations in Pentaho to create many links.
I always find ways to have minimal code and create the processes with many parameters. I am able to reuse processes that I have created before.
Creating jobs and putting them into production, as well as the visibility that Pentaho gives, are both very simple.
What needs improvement?
Parallel execution could be better in Pentaho. It's very simple but I don't think it works well.
For how long have I used the solution?
I've been working with Pentaho for four or five years.
What do I think about the stability of the solution?
The stability is good.
What do I think about the scalability of the solution?
It's scalable.
How are customer service and support?
I find help on the forums.
Which solution did I use previously and why did I switch?
I used SQL Server Integration Services, but I have much more experience with Pentaho. I have also worked with Apache NiFi but it is more focused on single data processes but I'm always working with batch processes and large data sets.
How was the initial setup?
The first deployment was very complex because we didn't have experience with the solution, but the next deployment was simpler.
We create jobs weekly in Pentaho. The development time takes, on average, one week and the deployment takes just one day or so.
We just put it on Git and pull a server and schedule the execution.
We use it on-premises while the infrastructure is Amazon and Azure.
What other advice do I have?
I always recommend Pentaho for working with automated processes and to do API integrations.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
BI Analyst at a computer software company with 51-200 employees
Simple to use, supports custom transformations, and the open-source version can be used free of charge
Pros and Cons
- "This solution allows us to create pipelines using a minimal amount of custom coding."
- "I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
What is our primary use case?
I have used this ETL tool for working with data in projects across several different domains. My use cases include tasks such as transforming data that has been taken from an API like PayPal, extracting data from different sources such as Magenta or other databases, and transforming all of the information.
Once the transformation is complete, we load the data into data warehouses such as Amazon Redshift.
How has it helped my organization?
There are a lot of different benefits we receive from using this solution. For example, we can easily accept data from an API and create JSON files. The integration is also very good.
I have created many data pipelines and after they are created, they can be reused on different levels.
What is most valuable?
The best feature is that it's simple to use. There are simple data transformation steps available, such as trimming data or performing different types of replacement.
This solution allows us to create pipelines using a minimal amount of custom coding. Anyone in the company can do so, and it's just a simple step. If any coding is required then we can use JavaScript.
What needs improvement?
I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors. If there is a large amount of data then there is definitely a lag.
I would like to see a cloud-based deployment because it will allow us to easily handle a large amount of data.
For how long have I used the solution?
I have been working with Hitachi Lumada Data Integration for almost three years, across two different organizations.
What do I think about the stability of the solution?
There is definitely some lag but with a little improvement, it will be a good fit.
What do I think about the scalability of the solution?
This is a good product for an enterprise-level company.
We use this solution for all of our data integration jobs. It handles the transformation. As our business grows and the demand for data integration increases, our usage of this tool will also increase.
Between versions, they have added a lot of plugins.
How are customer service and support?
The technical support does not reply in a timely manner. I have filled out the support request form, one or two times, asking about different things, but I have not received a reply.
The support they have in place does not work very well. I would rate them one or two out of ten.
How would you rate customer service and support?
Negative
Which solution did I use previously and why did I switch?
In this business, they initially began with this product and did not use another one beforehand. I have also worked on the cloud-level integration tool.
How was the initial setup?
The initial setup and deployment are straightforward.
I have deployed it on different servers and on average, it takes an hour to complete. I have not read any documentation regarding installation. With my experience, we were able to set everything up.
What's my experience with pricing, setup cost, and licensing?
I primarily work on the Community Version, which is available to use free of charge. I have asked for pricing information but have not yet received a response.
What other advice do I have?
We are currently using version 8.3 but version 9 is available. More features to support big data are available in the newest release.
My advice for anybody who is considering this product is if they're looking for any kind of custom transformation, or they're gleaning data from multiple sources and sending it to multiple destinations, I definitely recommend this tool.
Overall, this is a good product and I recommend it.
I would rate this solution an eight out of ten.
Which deployment model are you using for this solution?
On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
Connects to different databases, origins of data, files, and SFTP
Pros and Cons
- "I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
- "I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector."
What is our primary use case?
I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized.
I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.
How has it helped my organization?
I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created.
Right now, I am working in the business intelligence area. However, we use BI in all our companies. So, it is not only in one area. So, I create different data parts for different business units, e.g., HR, IT, sales, and marketing.
What is most valuable?
A valuable feature is the number of connectors that I have. So, I can connect to different databases, origins of data, files, and SFTP. With SQL and NoSQL databases, I can connect, put it in my instructions, send it to my staging area, and create the format. Thus, I can format all my data in just one process.
What needs improvement?
I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector.
Hitachi can make a lot of improvements in the tool, e.g., in performance or latency or putting more emphasis on cloud solutions or NoSQL databases.
For how long have I used the solution?
I have more than 15 years of experience working with it.
What do I think about the stability of the solution?
The stability depends on the version. At the beginning, it was more focused on stability. As of now, some things have been deprecated. I really don't know why. However, I have been pretty happy with the tool. It is a very good tool. Obviously, there are better tools, but Pentaho is fast and pretty easy to use.
What do I think about the scalability of the solution?
It is scalable.
How are customer service and support?
Their support team will receive a ticket on any failures that you might have. We have a log file that lets us review our errors, both in Windows and Unix. So, we are able to check both operating systems.
If you don't pay any license, you are not allowed to use their support at all. While I have used it a couple of times, that was more than 10 years ago. Now, I just go to their community and any Pentaho forums. I don't use the support.
Which solution did I use previously and why did I switch?
I have used a lot of ETL data integrators, such as DataStage, Informatica, Talend, Matillion, Python, and even SQL. MicroStrategy, Qlik, and Tableau have instructional features, and I try to use a lot of tools to do instructions.
How was the initial setup?
I have built the solution. It does not change for cloud or on-premise developments.
You create in your development environments, then you move to test. After that, you do the volume and integrity testing, then you go to UAT. Finally, you move to production. It does depend on the customer. You can thoroughly create the entire product structure as well as all the files that you need. Once you put it in production, it should work. You should have the same structure in development, test, and production.
What was our ROI?
It is free. I don't spend money on it.
It will reduce a lot of the time that you work with data.
What's my experience with pricing, setup cost, and licensing?
I use it because it is free. I download from their page for free. I don't have to pay for a license. With other tools, I have to pay for the licenses. That is why I use Pentaho.
I used to work with the complete suite of Pentaho, not only Data Integration. I used to build some solutions from scratch. I used to work with the Community version and Enterprise versions. With the Enterprise version, it is more than building cubes. I am building a BI solution that I can explore. Every time that I use Pentaho Data Integration, I never spend any money because it comes free with the tool. If you pay for the Enterprise license, Pentaho Data Integration is included. If you don't pay for it and use the Community version, Data Integration is included for free.
Which other solutions did I evaluate?
I used to work with a reseller of Pentaho. That is why I started working with it. Also, I did some training for Pentaho at the company that I used to work for in Argentina, where we were a Platinum reseller.
Pentaho is easy to use. You don't need to install anything. You can just open the script and start working on it. That is why I chose it. With Informatica, you need to do a server installation, but some companies might not allow some installation in their production or normal environment.
I feel pretty comfortable using the solution. I have tried to use other tools, but I always come back to Pentaho because it is easier.
Pentaho is open source. While Informatica is a very good tool, it is pretty expensive. That is one of the biggest cons for the data team because you don't want to pay money for tools that just only help you to work.
What other advice do I have?
I would rate this solution as eight out of 10. One of the best things about the solution is that it is free.
I used to sell Pentaho. It has a lot of pros and cons. From my side, there are more pros than cons. There isn't one tool that can do everything that you need, but this tool is one of those tools that helps you to complete your tasks and it is pretty integrable with other tools. So, you can switch Pentaho on and off from different tools and operating systems. You can use it in Unix, Linux, Windows, and Mac.
If you know how to develop different things and are very good at Java, you can create your own connectors. You can create a lot of things.
It is a very good tool if you need to work with data. There isn't a database that you can't manage with this tool. You can work with it and manage all the data that you want to manage.
Which deployment model are you using for this solution?
Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Project Manager at a computer software company with 51-200 employees
Forums are helpful, and creating ETL jobs is simpler than in other solutions
Pros and Cons
- "I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You have to search all these different places using a mouse, clicking everywhere... each report is coded in a binary file... You cannot search with a text search tool..."
What is our primary use case?
I was working with Pentaho for a client. I had to implement complicated data flows and extraction. I had to take data from several sources in a PostgreSQL database by reading many tables in several databases, as well as from Excel files. I created some complex jobs. I also had to implement business reports with the Pentaho Report Designer.
The client I was working for had Pentaho on virtual machines.
What is most valuable?
The ETL feature was the most valuable to me. I like it very much. It was very good.
What needs improvement?
I was not happy with the Pentaho Report Designer because of the way it was set up. There was a zone and, under it, another zone, and under that another one, and under that another one. There were a lot of levels and places inside the report, and it was a little bit complicated. You had to search all these different places using a mouse, clicking everywhere. The interface does not enable you to find things and manage all that. I don't know if other tools are better for end-users when it comes to the graphical interface, but this was a bit complicated. In the end, we were able to do everything with Pentaho.
And when you want to improve the appearance of your report, Pentaho Report Designer has complicated menus. It is not very user-friendly. The result is beautiful, but it takes time.
Also, each report is coded in a binary file, so you cannot read it. Maybe that's what the community or the developers want, but it is inconvenient because when you want to search for information, you need to open the graphical interface and click everywhere. You cannot search with a text search tool because the reports are coded in binary. When you have a lot of reports and you want to find where a precise part of one of your reports is, you cannot do it easily.
The way you specify parameters in Pentaho Report Designer is a little bit complex. There are two interfaces. The job creators use the PDI which provides the ETL interface, and it's okay. Creating the jobs for extract/transform/load is simpler than in other solutions. But there is another interface for the end-users of Pentaho and you have to understand how they relate to each other, so it's a little bit complex. You have to go into XML files, which is not so simple.
Also, using the solution overall is a little bit difficult. You need to be an engineer and somebody with a technical background. It's not absolutely easy, it's a technical tool. I didn't immediately understand it and had to search for information and to think about it.
For how long have I used the solution?
I used Hitachi Lumada Data Integration, Pentaho, for approximately two years.
What do I think about the stability of the solution?
The stability was perfect.
What do I think about the scalability of the solution?
I didn't scale the solution. I had to migrate from an old Pentaho to a new Pentaho. I had quite a big set of data, but I didn't add new data. I worked with the same volume of data all the time so I didn't test the scaling.
In the company I consulted for, there were about 15 people who input the data and worked with the technical part of Pentaho. There were a lot of end-users, who were the people interested in the reports; on the order of several thousand end-users.
How are customer service and support?
The technical support was okay. I used the open-source version of Pentaho and I used the forum. I found what I needed. And, the one or two times when I didn't find something, I asked a question in the forum and I received an answer very quickly. I appreciated that a lot. I had an answer one or two hours later. It's very good that somebody from Pentaho Enterprise responds so rapidly.
How was the initial setup?
The initial setup was complex, but I'm an engineer and it's my job to deal with complex systems. It's not the most complex that I have dealt with, but it was still somewhat complex. The procedure was explained on the Pentaho website in the documentation. You had to understand which module does what. It was quite complex.
It took quite a long time because I had to troubleshoot, to understand what was wrong, and I had to do it several times before it worked.
What's my experience with pricing, setup cost, and licensing?
I didn't purchase Pentaho. There is a business version but I used only the open source. I was fully satisfied and very happy with it. It's a very good open-source solution. The communication channels, the updates, the patches, et cetera are all good.
What other advice do I have?
I would fully recommend Pentaho. I have already recommended it to some colleagues. It's a good product with good performance.
Overall, I was very happy with it. It was complicated, but that is part of my job. I was happy with the result and the stability. The Data Integration product is simpler than the Report Designer. I would rate the Data Integration at 10 out of 10 and the Report Designer at nine, because of the graphical interface.
Disclosure: My company has a business relationship with this vendor other than being a customer. System integrator
Systems Analyst at a university with 5,001-10,000 employees
Reuse of ETLs with metadata injection saves us development time, but the reporting side needs notable work
Pros and Cons
- "The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
- "The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
What is our primary use case?
We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from.
We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.
How has it helped my organization?
Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.
And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs.
What is most valuable?
The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.
Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.
In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.
Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.
What needs improvement?
The transition to the web-based solution has taken a little longer and been more tedious than we would like and it's taken away development efforts towards the reporting side of the tool. They have a reporting tool called Pentaho Business Analytics that does all the report creation based on the data integration tool. There are a lot of features in that product that are missing because they've allocated a lot of their resources to fixing the data integration, to make it more web-based. We would like them to focus more on the user interface for the reporting.
The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet. We have between 500 and 800 reports in our system now. We've had to maintain an external spreadsheet with IDs to identify the location of all of those reports, instead of having that built into the system. It's been frustrating for us that they can't just build a simple search feature into the product to search for report names. It needs to be more in line with other reporting tools, like Tableau. Tableau has a lot more features and functions.
Because the reporting is lacking, only the deans and above are using it. It could be used more, and we'd like it to be used more.
Also, while the solution provides us with a single, end-to-end data management experience from ingestion to insights, it does but it doesn't give us a full history of where it's coming from. If we change a field, we can't trace it through from the reporting to the ETL field. Unfortunately, it's a manual process for us. Hitachi has a new product to do that and it searches all the fields, documents, and files just to get your pipeline mapped, but we haven't bought that product yet.
For how long have I used the solution?
I've been using Lumada Data Integration since version 4.2. We're now on version 9.1.
What do I think about the stability of the solution?
The stability has been great. Other than for upgrades, it has been pretty stable.
What do I think about the scalability of the solution?
The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.
For maintenance, surprisingly, it's just me who does so in our organization.
How are customer service and support?
The only issue that we've had is that it takes a little longer than we would like for support to resolve something, although things do eventually get incorporated. They're very quick to respond to an issue, but the fixing of the issue is not as quick.
For example, a few versions ago, when we upgraded it, we found that the upgrade caused a whole bunch of issues with the Oracle data types and the way the ETL was working with them. It wasn't transforming to the data types properly, the way we were expecting it to. In the previous version that we were using it was working fine, but the upgrade caused the issue, and it took them a while to fix that.
How would you rate customer service and support?
Neutral
Which solution did I use previously and why did I switch?
We didn't have another tool. This is the only tool we have used to create the data warehouse between the two systems. When we started looking at solutions, this one was great because it was open source and Java-based, and it had a Community Edition. But we actually purchased the Enterprise Edition.
How was the initial setup?
I came in after it was purchased and after the first deployment.
What's my experience with pricing, setup cost, and licensing?
We renew our license every two years. When I spoke to the project manager, he indicated that the pricing has been going up every two years. It's going to reach a point where, eventually, we're going to have to look at alternative solutions because of the price.
When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho. When they bought it, the price shot up. They said the increase is because of all the improvements they put into the product and the support that they're providing. From our point of view, their improvements are mostly on the data integration part of it, instead of the reporting part of it, and we aren't particularly happy with that.
Which other solutions did I evaluate?
I've used Tableau and other reporting tools, but Tableau sticks out because the reporting tool is much nicer. Tableau has its drawbacks with the ETL, because you can only use Tableau datasets. You have to get data into a Tableau file dataset and then the ETL part of it is stuck in Tableau forever.
If we could use the Pentaho ETL and the Tableau reporting we'd be happy campers.
What other advice do I have?
It's a great product. The ETL part of the product is really easy to pick up and use. It has a graphical interface with the ability to be more complex via scripting and features that you can add.
When looking at Hitachi Vantara's roadmap, the ability to upgrade more easily is one element of it that is important to us. Also, they're going more towards web-based solutions, instead of having local client development tools. If it does go on the web, and it works the same way it works on the client, that would be a nice feature. Currently, because we have these local client development tools, we have to have a VM client for our developers to use, and that makes it a little more tricky. Whereas if they put it on the web, then all our developers would be able to use any desktop and access the web for development.
When it comes to the query performance of the solution on large datasets, we haven't had any issues with it. We have one table in our data warehouse that has about 120 million rows and we haven't had any performance issues.
The solution gives you the flexibility to deploy it in any environment, whether on-prem or in the cloud. With our particular implementation, we've done a lot of customizations. We have special things that we bolted onto the product, so it's not as easy to put it onto the cloud for us. All of our customizations and bolt-ons end up costing us more because they make upgrades more difficult and time-consuming. We don't use an automated upgrade process. It's manual. We have to do a full reinstall and then apply all our bolt-ons and make sure it still works. If we could automate that process it would certainly reduce our costs.
In terms of updating to version 9.2, which is the latest version, we're going to look into it next year and see what level of effort is required and determine how it impacts our current system. They release a new update about every six months, and there is a major release every year or two, so it's quite a fast schedule for updates.
Overall, I would rate our satisfaction with our decision to purchase Hitachi products as a seven out of 10. I would definitely recommend the data integration tool but I wouldn't recommend the reporting tool.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Analytics Team Leader at a healthcare company with 11-50 employees
Enables us to manage our workload and generate a high volume of reporting
Pros and Cons
- "We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule."
- "Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in."
What is our primary use case?
We use it to connect to multiple databases and generate reporting. We also have ETL processes running on it.
Portions of it are in AWS, but we also have desktop access.
How has it helped my organization?
The solution has allowed us to automate reporting by automating its scheduling.
It is also important to us that the solution enables you to leverage metadata to automate data pipeline templates and reuse them. It allows us to generate reports with fewer resources.
If we didn't have this solution, we wouldn't be able to manage our workload or generate the volume of reporting that we currently do. It's very important for us that it provides a single, end-to-end data management experience from ingestion to insights. We are a high-volume department and without those features, we wouldn't be able to manage the current workload.
What is most valuable?
We're using the PDI and the repository function, and they give us the ability to easily generate reporting and output, and to access data. We also like the ability to schedule.
What needs improvement?
Since Hitachi took over, I don't feel that the documentation is as good within the solution. It used to have very good help built right in. There's good documentation when you go to the site but the help function within the solution hasn't been as good since Hitachi took over.
For how long have I used the solution?
I've been using Lumada Data Integration since 2016, but the company has been using it much longer.
We are currently on version 8.3, but we're going to be doing an upgrade to 9.2 next month.
What do I think about the stability of the solution?
The stability is good. We haven't had any issues related to Pentaho.
What do I think about the scalability of the solution?
Its scalability is very good. We use it with multiple, large databases. We've added to it over time and it scales.
We have about 10 users of the solution including a data quality manager, clinical analyst, healthcare informatics analysts, senior healthcare informatics analyst, and an analytics team leader. It's used very extensively by all of those job roles in their day-to-day work. When we add additional staff members, they routinely get access to and are trained on the solution.
How are customer service and support?
Their ability to quickly and effectively solve issues we have brought up is very good. They have a ticketing system and they're very responsive to any tickets we enter. And that's true not only for issues but if we have questions about functionality.
How would you rate customer service and support?
Positive
How was the initial setup?
The solution is very flexible. It's pretty easy to set up connections within the solution.
Maintenance isn't required day-to-day. Our technical staff does the upgrades. They also, on occasion, have to do things like restarting the services, but that's typically related to server issues, not Pentaho itself.
What other advice do I have?
My advice would be to take advantage of the training that's offered.
The query performance of Lumada on large data sets is good, but the query performance is really only as good as the server.
In terms of Hitachi's roadmap, we haven't seen it in a little while. We did have a concern that they're going to be going away from Pentaho and rolling it into another product and we're not quite sure what the result of that is going to be. We don't have a good understanding of what's going to change. That's the concern.
We currently only use Pentaho. We don't have other Hitachi products but we're satisfied with it. We would recommend Pentaho.
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Updated: January 2026
Product Categories
Data IntegrationPopular Comparisons
Informatica Intelligent Data Management Cloud (IDMC)
Azure Data Factory
Informatica PowerCenter
Palantir Foundry
Qlik Talend Cloud
Oracle Data Integrator (ODI)
IBM InfoSphere DataStage
Oracle GoldenGate
SAP Data Services
Spring Cloud Data Flow
Alteryx Designer
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:
- Which ETL tool would you recommend to populate data from OLTP to OLAP?
- What do you think can be improved with Hitachi Lumada Data Integrations?
- What do you use Hitachi Lumada Data Integrations for most frequently?
- Is using Hitachi Lumada Data Integrations cost-effective? Did this solution save money for your company compared to other products?
- When evaluating Data Integration, what aspect do you think is the most important to look for?
- Microsoft SSIS vs. Informatica PowerCenter - which solution has better features?
- What are the best on-prem ETL tools?
- Which integration solution is best for a company that wants to integrate systems between sales, marketing, and project development operations systems?
- Experiences with Oracle GoldenGate vs. Oracle Data Integrator?
- What are the must-have features for a Data integration system?
















