No more typing reviews! Try our Samantha, our new voice AI agent.
MARIA PILAR CANDA - PeerSpot reviewer
Assosiate Partner at Autana Business Partners
Real User
Top 5
Sep 23, 2024
Efficient data integration with cost savings but may be less efficient
Pros and Cons
  • "It is easy to use, install, and start working with."
  • "Larger data jobs take more time to execute."

What is our primary use case?

I have a team who has experience with integration. We are service providers and partners. Generally, clients buy the product directly from the company.

How has it helped my organization?

It is easy to use, install, and start working with. This is one of the advantages compared to other key vaulting products. The relationship between price and functionality is excellent, resulting in time and money savings of between twenty-five and thirty percent.

What is most valuable?

One of the advantages is that it is easy to use, install, and start working with. For certain volumes of data, the solution is very efficient.

What needs improvement?

Pentaho may be less efficient for large volumes of data compared to other solutions like Talend or Informatica. Larger data jobs take more time to execute.

Pentaho is more appropriate for jobs with smaller volumes of data.

Buyer's Guide
Pentaho Data Integration and Analytics
June 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.
900,838 professionals have used our research since 2012.

For how long have I used the solution?

I have used the solution for more than ten years.

What do I think about the stability of the solution?

The solution is stable. Generally, one person can manage and maintain it.

What do I think about the scalability of the solution?

Sometimes, for large volumes of data, a different solution might be more appropriate. Pentaho is suited for smaller volumes of data, while Talend is better for larger volumes.

How are customer service and support?

Based on my experience, the solution has been reliable.

Which solution did I use previously and why did I switch?

We did a comparison between Talend and Pentaho last year.

How was the initial setup?

The initial setup is straightforward. It is easy to install and start working with.

What about the implementation team?

A team with experience in integration manages the implementation.

What was our ROI?

The relationship between price and functionality is excellent. It results in time and money savings of between twenty-five and thirty percent.

What's my experience with pricing, setup cost, and licensing?

Pentaho is cheaper than other solutions. The relationship between price and functionality means it provides good value for money.

Which other solutions did I evaluate?

We evaluated Talend and Pentaho.

What other advice do I have?

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer. MSP
PeerSpot user
Senior Product Manager at a retailer with 10,001+ employees
Real User
Top 10
Jul 30, 2024
Loads data into the required tables and can be plug-and-played easily

What is our primary use case?

The use cases involve loading the data into the required tables based on the transformations. We do a couple of transformations, and based on the business requirement, we load the data into the required tables.

What is most valuable?

It's a very lightweight tool. It can be plug-and-played easily and read data from multiple sources. It's a very good tool for small to large companies. People or customers can learn very easily to do the transformations for loading and migrating data. It's a fantastic tool in the open-source community.

When compared to other commercial ETL tools, this is a free tool where you can download and do multiple things that the commercial tools are doing. It's a pretty good tool when compared to other commercial tools. It's available in community and enterprise editions. It's very easy to use.

What needs improvement?

It is difficult to process huge amounts of data. We need to test it end-to-end and conclude how much is the processing of data. If it is an enterprise edition, we can process the data.

For how long have I used the solution?

I have been using Pentaho Data Integration and Analytics for 11-12 years.

What do I think about the stability of the solution?

We process a small amount of data, but it's pretty good.

What do I think about the scalability of the solution?

It's scalable across any machine,

How are customer service and support?

Support is satisfactory. A few of my colleagues are also there, working with Hitachi to provide solutions whenever a ticket or Jira is raised for them. 

How would you rate customer service and support?

Positive

How was the initial setup?

Installation is very simple. When you go to the community and enterprise edition, it's damn simple. Even you can install it very easily.

One person is enough for the installation

What's my experience with pricing, setup cost, and licensing?

The product is quite cheap.

What other advice do I have?

It can quickly implement slowly changing dimensions and efficiently read flat files, loading them into tables quickly. Additionally, "several copies to the stat h enables parallel partitioning. In the Enterprise Edition, you can restart your jobs from where they left off, a valuable feature for ensuring continuity. Detailed metadata integration is also very straightforward, which is an advantage. It is lightweight and can work on various systems.

Any technical guy can do everything end to end.

Overall, I rate the solution a ten out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
June 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: June 2026.
900,838 professionals have used our research since 2012.
Solution Integration Consultant II at a tech vendor with 201-500 employees
Consultant
Jun 8, 2022
Reduces the effort required to build sophisticated ETLs
Pros and Cons
  • "We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."
  • "We have seen at least 30% savings in terms of effort, which has helped us to price our service and products more aggressively in the market and win more clients."
  • "It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively."

What is our primary use case?

My work primarily revolves around data migration and data integration for different products. I have used them in different companies, but for most of our use cases, we use it to integrate all the data that needs to flow into our product. Also, we can have outbound from our product when we need to send to different, various integration points. We use this product extensively to build ETLs for those use cases.

We are developing ETLs for the inbound data into the product as well as outbound to various integration points. Also, we have a number of core ETLs written on this platform to enhance our product.

We have two different modes that we offer: one is on-premises and the other is on the cloud. On the cloud, we have an EC2 instance on AWS, then we have installed that EC2 instance and we call it using the ETL server. We also have another server for the application where the product is installed.

We use version 8.3 in the production environment, but in the dev environment, we use version 9 and onwards.

How has it helped my organization?

We have been able to reduce the effort required to build sophisticated ETLs. Also, we now are in the migration phase from an on-prem product to a cloud-native application. 

We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines.

What is most valuable?

The metadata injection feature is the most valuable because we have used it extensively to build frameworks, where we have used it to dynamically generate code based on different configurations. If you want to make a change at all, you do not need to touch the actual code. You just need to make some configuration changes and the framework will dynamically generate code for that as per your configuration. 

We have a UI where we can create our ETL pipelines as needed, which is a key advantage for us. This is very important because it reduces the time to develop for a given project. When you need to build the whole thing using code, you need to do multiple rounds of testing. Therefore, it helps us to save some effort on the QA side.

Hitachi Vantara's roadmap has a pretty good list of features that they have been releasing with every new version. For instance, in version 9, they have included metadata injection for some of the steps. The most important elements of this roadmap to our organization’s strategy are the data-driven approach that this product is taking and the fact that we have a very low-code platform. Combining these two is what gives us the flexibility to utilize this software to enhance our product.

What needs improvement?

It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively. 

For how long have I used the solution?

I have been working with this tool for five to six years.

What do I think about the stability of the solution?

They are making it a lot more stable. Earlier, stability used to be an issue when it was not with Hitachi. Now, we don't see those kinds of issues or bugs within the platform because it has become far more stable. Also, we see a lot of new big data features, such as connecting to the cloud.

What do I think about the scalability of the solution?

Lumada is flexible to deploy in any environment, whether on-premises or the cloud, which is very important. When we are processing data in batches on certain days, e.g., at the end of the week or month, we might have more data and need more processing power or RAM. However, most times, there might be very minimal usage of that CPU power. In that way, the solution has helped us to dynamically scale up, then scale down when we see that we have more data that we need to process.

The scalability is another key advantage of this product versus some of the others in the market since we can tweak and modify a number of parameters. We are really impressed with the scalability.

We have close to 80 people who are using this product actively. Their roles go all the way from junior developers to support engineers. We also have people who have very little coding knowledge and are more into the management side of things utilizing this tool.

How are customer service and support?

I haven't been part of any technical support discussions with Hitachi.

Which solution did I use previously and why did I switch?

We are very satisfied with our decision to purchase Hitachi's product. Previously, we were using another ETL service that had a number of limitations. It was not a modern ETL service at all. For anything, we had to rely on another third-party software. Then, with Hitachi Lumada, we don't have to do that. In that way, we are really satisfied with the orchestration or cloud-native steps that they offer. We are really happy on those fronts.

We were using something called Actian Services, which had less features and it ended up costing more than the enterprise edition of Pentaho.

We could not do a number of things on Actian. For instance, we were unable to call other APIs or connect to an S3 bucket. It was not a very modern solution. Whereas, with Pentaho, we could do all these things as well as have great marketplaces where we could find various modules and third-party plugins. Those features were simply not there in the other tool.

How was the initial setup?

The initial setup was pretty straightforward. 

What about the implementation team?

We did not have any issues configuring it, even in my local machine. For the enterprise edition, we have a separate infrastructure team doing that. However, for at least the community edition, the deployment is pretty straightforward.

What was our ROI?

We have seen at least 30% savings in terms of effort. That has helped us to price our service and products more aggressively in the market, helping us to win more clients.

It has reduced our ETL development time. Per project, it has reduced by around 30% to 35%.

We can price more aggressively. We were actually able to win projects because we had great reusability of ETLs. A code that was used for one client can be reused with very minimal changes. We didn't have any upfront cost for kick-starting projects using the Community edition. It is only the Enterprise edition that has a cost. 

What's my experience with pricing, setup cost, and licensing?

For most development tasks, the Enterprise edition should be sufficient. It depends on the type of support that you require for your production environment.

Which other solutions did I evaluate?

We did evaluate SSIS since our database is based on Microsoft SQL server. SSIS comes with any purchase of an SQL Server license. However, even with SSIS, there were some limitations. For example, if you want to build a package and reuse it, SSIS doesn't provide the same kinds of abilities that Pentaho does. The amount of reusability reduces when we try to build the same thing using SSIS. Whereas, in Pentaho, we could literally reuse the same code by using some of its features.

SSIS comes with the SQL Server and is easier to maintain, given that there are far more people who would have knowledge of SSIS. However, if I want to do a PCP encryption or make an API connection, it is difficult. To create a reusable package is not that easy, which would be the con for SSIS. 

What other advice do I have?

The query performance depends on the database. It is more likely to be good if you have a good database server with all the indexes and bells and whistles of a database. However, from a data integration tool perspective, I am not seeing any issues with respect to query performance.

We do not build visualization features that much with Hitachi. For the reporting purposes, we have been using one of the tools from the product, then prepare the data accordingly. 

We use this for all the projects that we are currently running. Going forward, we will be sticking only to using this ETL tool.

We haven't had any roadblocks using Lumada Data Integration.

On a scale of one to 10, I would recommend Hitachi Vantara to a friend or colleague as a nine.

If you need to build ETLs quickly in a low-code environment, where you don't want to spend a lot of time on the development side of things but it is a little difficult to find resources, then train them in this product. It is always worth that effort because it ends up saving a lot of time and resources on the development side of projects.

Overall, I would rate the product as a nine out of 10.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
RicardoDíaz - PeerSpot reviewer
COO / CTO at a tech services company with 11-50 employees
Real User
Jun 8, 2022
We can create pipelines with minimal manual or custom coding, and we can quickly implement what we need with its drag-and-drop interface
Pros and Cons
  • "Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that."
  • "Previously, I had three people to collect all the data and integrate all Excel spreadsheets, and it took them a day or two, but now I can do this work in about 15 minutes."
  • "In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud."
  • "Their technical support is not good. I would rate them 2 out of 10 because they don't have good technical skills to solve problems."

What is our primary use case?

We are a service delivery enterprise, and we have different use cases. We deliver solutions to other enterprises, such as banks. One of the use cases is for real-time analytics of the data we work with. We take CDC data from Oracle Database, and in real-time, we generate a product offer for all the products of a client. All this is in real-time. The client could be at the ATM or maybe at an agency, and they can access the product offer. 

We also use Pentaho within our organization to integrate all the documents and Excel spreadsheets from our consultants and have a dashboard for different hours for different projects.

In terms of version, currently, Pentaho Data Integration is on version 9, but we are using version 8.2. We have all the versions, but we work with the most stable one. 

In terms of deployment, we have two different types of deployments. We have on-prem and private cloud deployments.

How has it helped my organization?

I work with a lot of data. We have about 50 terabytes of information, and working with Pentaho Data Integration along with other databases is very fast.

Previously, I had three people to collect all the data and integrate all Excel spreadsheets. To give me a dashboard with the information that I need, it took them a day or two. Now, I can do this work in about 15 minutes.

It enables us to create pipelines with minimal manual coding or custom coding efforts, which is one of its best features. Pentaho is one of the few tools with which you can do anything you can imagine. Our business is changing all the time, and it is best for our business if I can use less time to develop new pipelines.

It provides the ability to develop and deploy data pipeline templates once and reuse them. I use them at least once a day. It makes my daily life easier when it comes to data pipelines.

Previously, I have used other tools such as Integration Services from Microsoft, Data Services for SAP, and Informatica. Pentaho reduces the ETL implementation time by 5% to 50%.

What is most valuable?

Pentaho from Hitachi is a suite of different tools. Pentaho Data Integration is a part of the suite, and I love the drag-and-drop functionality. It is the best. 

Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that.

What needs improvement?

Their client support is very bad. It should be improved. There is also not much information on Hitachi forums or Hitachi web pages. It is very complicated.

In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud.

For how long have I used the solution?

I have been using Pentaho Data Integration for 12 years. The first version that I tested and used was 3.2 in 2010.

How are customer service and support?

Their technical support is not good. I would rate them 2 out of 10 because they don't have good technical skills to solve problems.

How would you rate customer service and support?

Negative

How was the initial setup?

It is very quick and simple. It takes about five minutes.

What other advice do I have?

I have a good knowledge of this solution, and I would highly recommend it to a friend or colleague. 

It provides a single, end-to-end data management experience from ingestion to insights, but we have to create different pipelines to generate the metadata management. It's a little bit laborious to work with Pentaho, but we can do that.

I've heard a lot of people say it's complicated to use, but Pentaho is one of the few tools where you can do anything you can imagine. It is very good and quite simple, but you need to have the right knowledge and the right people to handle the tool. The skills needed to create a business intelligence solution or a data integration solution with Pentaho are problem-solving logic and maybe database knowledge. You can develop new steps, and you can develop new functionality in Pentaho Lumada, but you must have the knowledge of advanced Java programming. Our experience, in general, is very good. 

Overall, I am satisfied with our decision to purchase Hitachi's product services and solutions. My satisfaction level is at an eight out of ten.

I am not much aware of the roadmap of Hitachi Vantara. I don't read much about that.

I would rate this solution an eight out of ten. 

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner
PeerSpot user
Enterprise Data Architect at a manufacturing company with 201-500 employees
Real User
Jan 3, 2022
It's flexible and can do almost anything I want it to do
Pros and Cons
  • "Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool things is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing."
  • "It has made a world of difference."
  • "Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying."
  • "Some of the scheduling features about Lumada drive me buggy."

What is our primary use case?

We mainly use Lumada to load our operational systems into our data warehouse, but we also use it for monthly reporting out of the data warehouse, so it's to and from. We use some of Lumada's other features within the business to move data around. It's become quite the Swiss army knife.

We're primarily doing batch-type reports that go out. Not many people want to sift through data and pick it to join it in other things. There are a few, but again, I usually wind up doing it. The self-serve feature is not as big a seller to me because of our user base. Most of the people looking at it are salespeople.

Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool aspects is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing. 

The process that Lumada replaced was arcane. The sentiment among our employees, particularly the warehouse personnel, was that it was punitive. They would say, "I didn't get a bonus this month because the warehouse manager didn't like me." Now we can show them the numbers and say, "You didn't get a bonus because you were slacking off compared to everybody else." It's allowed us to be very transparent in how we're doing these tasks. Previously, that was all done behind the vest. I want people to trust the numbers, and these tools allow me to do that because I can instantly show that the information is correct.

That is a huge win for us. When we first rolled it out, I spent a third of my time justifying the numbers. Now, I rarely have to do that. It's all there, and they can see it, so they trust what the information is. If something is wrong, it's not a case of "Why is this being computed wrong?" It's more like: "What didn't report?"

We have 200 stores that communicate to our central hub each night. If one of them doesn't send any data, somebody notices now. That wasn't the case in the past. They're saying, "Was there something wrong with the store?" instead of, "There's something wrong with the data."

With Lumada's single end-to-end data management, we no longer need some of the other tools that we developed in-house. Before that, everything was in-house. We had a build-versus-buy mentality. It simplified many aspects that we were already doing and made that process quicker. It has made a world of difference. 

This is primarily anecdotal, but there were times where I'd get an IM from one of the managers saying, "I'm looking at this in the sales meeting and calling out what somebody is saying. I want to make sure that this is what I'm seeing." I made a couple of people mad. Let's say they're no longer working for us, and we'll leave it at that. If you're not making somebody mad, you're not doing BI right. You're not asking the right questions.

Having a single platform for data management experience is crucial for me. It lets me know when something goes wrong from a data standpoint. I know when a load fails due to bad data and don't need to hunt for it. I've got a status board, so I can say, "Everything looks good this morning." I don't have to dig into it, and that has made my job easier. 

What's more, I don't waste time arguing about why the numbers on this report don't match the ones on another because it's all coming from the same place. Before, they were coming from various places, and they wouldn't match for whatever reason. Maybe there's some piece of code in one report that isn't being accounted for in the other. Now, they're all coming from the same place. So everything is on the same level.

What is most valuable?

I'm a database guy, not a programmer, so Lumada's ability to create low-code pipelines without custom coding is crucial for me. I don't need to do any Java customization. I've had to write SQL scripts and occasionally a Javascript within it, but those are few and far between. I can do everything else within the tool itself. I got into databases because I was sick and tired of getting errors when I compiled something. 

What needs improvement?

Some of the scheduling features about Lumada drive me buggy. The one issue that always drives me up the wall is when Daylight Savings Time changes. It doesn't take that into account elegantly. Every time it changes, I have to do something. It's not a big deal, but it's annoying. That's the one issue, but I see the limitation, and it might not be easily solvable. 

For how long have I used the solution?

I started working with Lumada long before it was acquired by Hitachi. It's been about 11 years now. I'm the primary person in the company who works with it. A few people know the solution tangentially. Aside from very basic elements, most tasks related to Lumada usually fall in my lap.

What do I think about the stability of the solution?

Lumada's stability and performance are pretty good. The limitations I run into are usually with the database that I'm trying to write to rather than read from. The only time I have a real issue is when an incredibly complex query takes 20 minutes to start returning data. It's sitting there going, "All right. Give me something to do." But then again, I've got it running on a machine that's got 64 gigs of memory.

What do I think about the scalability of the solution?

Scaling out our processes hasn't been a big deal. We're a relatively small shop with only a couple of production databases. We're more of a regional enterprise, and I haven't had any issues with performance yet. It's always been some other product or solution that has gotten in the way. Lumada can handle anything we throw at it. Every night I run reports on our part ledger. That includes 200 million records, and Lumada can chew through it in about an hour and a half. 

I know we can extend processing into the Spark realm if we need to. We've thought about that but never really needed it. It's something we keep in our back pocket. Someone suggested trying it out, but it never really got off the ground because other more pressing needs came up. From what I've seen, it'll scale out to whatever I need it to do. Any limitations are in the backend rather than the software. I've done some metrics on it. It's the database that I have to wait on more than the software. It's not doing a whole lot CPU-wise. My limitations are elsewhere, usually.

Right now, we have about 100 users working with Lumada. About 100 people log in to the system, but probably 200 people get reports from it. Only about 50 use the analysis tools, including the top sales managers and all of the buying group. There are also some analysts from various groups who use it constantly. 

How are customer service and support?

I'd give Lumada support a nine out of 10. It has been exceptional historically, but there was a rough patch about a year and a half ago shortly after Hitachi took over. They were in a transition period, but it has been very responsive since. I usually don't need help. When I do, I get a response the same day, and somebody's working on it. I'm not too worried about things going wrong, like an outage. I've never had that happen.

Sometimes when we do upgrades, and I'm in my test environment, I'll contact them and say, "I ran into this weird issue, and it's not doing what it should. What do you make of it?" They'll tell me, "You got to do this, that, and the other thing." They've been good about it.

Which solution did I use previously and why did I switch?

Before Lumada, we had a variety of homegrown solutions. Most of it was centered on our warehouse management system because that was our primary focus. There were also reports within the point of sale system, and the two never crossed paths. Now they're integrated. There was also an analysis tool they had before I came on board. I can't remember the name of it. The company had something, but it didn't do what they thought it would do, and the project fizzled.

Part of the problem was that they didn't have somebody in-house who understood business intelligence until they brought me on. They were very operationally focused before that. The management was like, "We need more insight into what we're doing and how we're doing it." That was phase two of the big data warehouse push. The management here is relatively conservative in that regard, so they're somewhat slow to say, "Hey. We need to do something along these lines." But when they decide to go, get out of the way because here we come.

I used a different tool at my previous job called Informatica. Lumada has less of a learning curve for deployment. Lumada was similar enough to Informatica that it's like, "Okay. This makes sense," but there were a few differences. Once I figured out the difference, it made a lot of sense to me. The entire chain of steps Lumada allows you to do is intuitive.

Informatica was a lot more tedious to use. You had to hook every column up from its source to its target. With Lumada, it's the name that matters and its position. It made aspects a whole lot easier and less tedious. Every so often, it bites me in the butt. If I get a column out of order, it'll let me know I did something wrong. But it's much less error-prone because I don't have to hook every column up from its source to its target anymore. With Informatica, there were times where I spent 20 minutes just sitting there trying not to drool on myself. It was terrible. 

How was the initial setup?

Setting up Lumada was pretty straightforward. We just rolled it out and went from proof of concept to live in about a year. I was relatively new to the organization at the time and was still getting a feel for it — knowing where data was and what all these things mean. My experience at a shoe company didn't exactly translate to an auto parts business. I went to classes down in Orlando to learn the product, then we went from there and just tried it. We had a few faux pas here and there, but we knew.

What was our ROI?

Lumada has also significantly reduced our ETL development time. It depends on the project, but if someone comes to me with a new data source, I can typically integrate it within a week, whereas it used to take a month. It's a 4-to-1 reduction. It's allowed our IT department to stay lean. I worked at another company with 70 IT people, 50 of which were programmers. My current workplace has 12 people, and six are programmers. The others are UI-type developers, and there are about six database people, including me. We save the equivalent of a full-time employee, so that's anywhere from $50,000 to $75,000 a year.

What's my experience with pricing, setup cost, and licensing?

I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other solution that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive. We felt the value was much greater than the cost, and the value for the money was much better than if we had gone with other solutions.

Which other solutions did I evaluate?

We didn't consider other options besides Lumada because we are members of an auto parts trade association, and they were using the Pentaho tool before it was Hitachi to do some ETL tasks. They recommended it, so we started using it. I evaluated a couple of other ones, but they cost more than we were willing to spend to try out this type of solution. Once we figured out what it could do for us, then it's like, "Okay. Now, we can do some real work here."

What other advice do I have?

I rate Lumada nine out of 10. The aspect I like about Lumada is its flexibility. I can make it do pretty much whatever I want. It's not perfect, but I haven't run into a tool that is yet. I haven't used every aspect of it, but there's very little that I can't make it do. I haven't run into a scenario where it couldn't handle a challenge we put in front of it. It's been a solid performer for us. I rarely have a problem that is due to Lumada. The issues I have with my loads are never because of the software.

If you plan to implement Lumada, I recommend going to the classes. Don't be afraid to ask dumb questions of support because many of them used to be consultants. They've all been there, done that. One of the guys I talk to regularly lives about 80 miles to the north of me. I have a rapport with him. They're willing to go above and beyond to make you successful.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Ridwan Saeful Rohman - PeerSpot reviewer
Data Engineering Associate Manager at Zalora Group
Real User
Top 20
Jul 4, 2024
Good abstraction and useful drag-and-drop functionality but can't handle very large data amounts
Pros and Cons
  • "The abstraction is quite good."
  • "If you develop it on MacBook, it'll be quite a hassle."

What is our primary use case?

I still use this tool on a daily basis. Comparing it to my experience with other ETL tools, the system I created using this tool was quite straightforward. It involves extracting data from MySQL, exporting it to CSV, storing it on S3, and then loading it into Redshift.

The PDI Kettle Job and Kettle Transformation are bundled by a shell script, then scheduled and orchestrated by Jenkins.

We continue to use this tool primarily because many of our legacy systems still rely on it. However, our new solution is mostly based on Airflow, and we are currently in the transition phase. Airflow is a data orchestration tool that predominantly uses Python for ETL processes, scheduling, and issue monitoring—all within a unified system.


How has it helped my organization?

In my current company, this solution has a limited impact as we predominantly employ it for handling older and simpler ETL tasks.

While it serves well in setting up ETL tools on our dashboard, its functionalities can now be found in several other tools available in the market. Consequently, we are planning a complete transition to Airflow, a more versatile and scalable platform. This shift is scheduled to be implemented over the next six months, aiming to enhance our ETL capabilities and align with modern data management practices.


What is most valuable?

This solution offers drag-and-drop tools with a minimal script. Even if you do not come from an IT background or have no software engineering experience, it is possible to use. It is quite intuitive, allowing you to drag and drop many functions.

The abstraction is quite good.

If you're familiar with the product itself, it has transformational abstractions and job abstractions. We can create smaller transformations in the Kettle transformation and larger ones in the Kettle job. Whether you're familiar with Python or have no scripting background at all, the product is useful.

For larger data, we use Spark.

The solution enables us to create pipelines with minimal manual or custom coding efforts. Even without advanced scripting experience, it is possible to create ETL tools. I recently trained a graduate from a management major who had no experience with SQL. Within three months, he became quite fluent, despite having no prior experience using ETL tools.

The importance of handling pipeline creation with minimal coding depends on the team. If we switch to Airflow, more time is needed to teach fluency in the ETL tool. With these product abstractions, I can compress the training time to three months. With Airflow, it would take more than six months to reach the same proficiency.

We use the solution's ability to develop and deploy data pipeline templates and reuse them.

The old system, created by someone prior to me in my organization, is still in use. It was developed a long time ago and is also used for some ad hoc reporting.

The ability to develop and deploy data pipeline templates once and reuse them is crucial to us. There are requests to create pipelines, which I then deploy on our server. The system needs to be robust enough to handle scheduling without failure.

We appreciate the automation. It's hard to imagine how data teams would work if everything were done on an ad hoc basis. Automation is essential. In my organization, 95% of our data distributions are automated, and only 5% are ad hoc. With this solution, we query data manually, process it on spreadsheets, and then distribute it within the organization. Robust automation is key.

We can easily deploy the solution on the cloud, specifically on AWS. I haven't tried it on another server. We deploy it on our AWS EC2, but we develop it on local computers, including both Windows and MacBooks.

I have personally used it on both. Developing on Windows is easier to navigate. On MacBooks, the display becomes problematic when enabling dark mode.

The solution has reduced our ETL development time compared to scripting. However, this largely depends on your experience.

What needs improvement?

Five years ago, when I had less experience with scripting, I would have definitely used this product over Airflow, as the abstraction is quite intuitive and easier for me to work with. Back then, I would have chosen this product over other tools that use pure scripting, as it would have significantly reduced the time required to develop ETL tools. However, this is no longer the case, as I now have more familiarity with scripting.

When I first joined my organization, I was still using Windows. Developing the ETL system on Windows is quite straightforward. However, when I switched to a MacBook, it became quite a hassle. To open the application, we had to first open the terminal, navigate to the solution's directory, and then run the executable file. Additionally, the display becomes quite problematic when dark mode is enabled on a MacBook.

Therefore, developing on a MacBook is quite a hassle, whereas developing on Windows is not much different from using other ETL tools on the market, like SQL Server Integration Services, Informatica, etc.

For how long have I used the solution?

I have been consistently using this tool since I joined my current company, which was approximately one year ago.

What do I think about the stability of the solution?

The performance is good. I have not tested the product at its bleeding edge. We only perform simple jobs. In terms of data, we extract it from MySQL and export it to CSV. There are only millions of data points, not billions. So far, it has met our expectations and is quite good for a smaller number of data points.

What do I think about the scalability of the solution?

I'm not sure that the product could keep up with significant data growth. It can be useful for millions of data points, but I haven't explored its capability with billions of data points. I think there are better solutions available on the market. This applies to other drag-and-drop ETL tools as well, like SQL Server Integration Services, Informatica, etc.

How are customer service and support?

We don't really use technical support. The current version that we are using is no longer supported by their representatives. We didn't update it yet to the newer version. 

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We're moving to Airflow. The switch was mostly due to debugging problems. If you're familiar with SQL for integration services, the ETL tools from Microsoft have quite intuitive debugging functions. You can easily identify which transformation has failed or where an error has occurred. However, in our current solution, my colleagues have reported that it is difficult to pinpoint the source of errors directly.

Airflow is highly customizable and not as rigid as our current product. We can deploy simple ETL tools as well as machine learning systems on Airflow. Airflow primarily uses Python, which our team is quite familiar with. Currently, only two out of 27 people on our team handle this solution, so not enough people know how to use it.

How was the initial setup?

There are no separations between the deployment and other teams. Each of our teams acts as individual contributors. We handle the entire implementation process, from face-to-face business meetings, setting timelines, developing the tools, and defining the requirements, to production deployment.

The initial setup is straightforward. Currently, the use of version control in our organization is quite loose. We are not using any version control software. The way we deploy it is as simple as putting the Kettle transformation file onto our EC2 server and overwriting the old file, that's it.

What's my experience with pricing, setup cost, and licensing?

I'm not really sure about the pricing of the product. I'm not involved in procurement or commissioning.

What other advice do I have?

We put it on our AWS EC2 server; however, during development, it was on our local server. We deploy it onto our EC2 server. We bundle it in our shell scripts, and the shell scripts are run by Jenkins.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Ahad Ahmed - PeerSpot reviewer
BI developer at Jubilee Life Insurance Company Ltd
Real User
Top 5
May 29, 2024
Offers features for data integration and migration
Pros and Cons
  • "The product is user-friendly and intuitive"
  • "The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business."
  • "Should provide additional control for the data warehouse"

What is our primary use case?

I have used the solution to gather data from multiple sources, including APIs, databases like Oracle, and web servers. There are a bunch of data providers available who can provide you with datasets to export in JSON format from clouds or APIs. 

What is most valuable?

The solution offers features for data integration and migration. Pentaho Data Integration and Analytics allows the integration of multiple data sources into one. The product is user-friendly and intuitive to use for almost any business. 

What needs improvement?

The solution should provide additional control for the data warehouse and reduce its size, as our organization's clients have expressed concerns regarding it. The vendor can focus on reducing capacity and compensate for it by enhancing product efficiency. 

For how long have I used the solution?

I have been using Pentaho Data Integration and Analytics for a year.  

How are customer service and support?

I have never encountered any issues with Pentaho Data Integration and Analytics. 

What's my experience with pricing, setup cost, and licensing?

I believe the pricing of the solution is more affordable than the competitors. 

Which other solutions did I evaluate?

I have worked with IBM DataStage along with Pentaho Data Integration and Analytics. The found the IBM DataStage interface to seem outdated in comparison to the Pentaho tool. IBM DataStage demands the user to drag and drop the services as well as the pipelines, similar to the process in SSIS platforms. Pentaho Data Integration and Analytics is also easier to comprehend from the first use than IBM DataStage. 

What other advice do I have?

The solution's ETL capabilities make data integration tasks easier and are used to export data from a source to a destination. At my company, I am using IBM data switches and the overall IBM tech stack for compatibility among the integrations, pipelines and user levels. 

I would absolutely recommend Pentaho Data Integration and Analytics to others. I would rate the solution a seven out of ten. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Manager, Systems Development at a manufacturing company with 5,001-10,000 employees
Real User
Aug 7, 2022
An affordable solution that makes it simple to do some fairly complicated things, but it could be improved in terms of consistency of different transformation steps
Pros and Cons
  • "It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there."
  • "The speed of developing solutions has been the best improvement, reducing development time by days or weeks compared to using a different tool."
  • "Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."
  • "In terms of our decision to purchase Hitachi's product services or solutions, our satisfaction level is average or on balance."

What is our primary use case?

Our primary use case is to populate a data warehouse and data marts, but we also use it for all kinds of data integration scenarios and file movement. It is almost like middleware between different enterprise solutions. We take files from our legacy app system, do some work on them, and then call SAP BAPIs, for example.

It is deployed on-premises. It gives you the flexibility to deploy it in any environment, whether on-premises or in the cloud, but this flexibility is not that important to us. We could deploy it on the cloud by spinning up a new server in AWS or Azure, but as a manufacturing facility, it is not important to us. Our customer preference is primarily to deploy things on-premises.

We usually stay one version behind the latest one. We're a manufacturing facility. So, we're very sensitive to any bugs or issues. We don't do automatic upgrades. They're a fairly manual process.

How has it helped my organization?

We've had it for a long time. So, we've realized a lot of the improvements that anybody would realize from almost any data integration product.

The speed of developing solutions has been the best improvement. It has reduced the development time and improved the speed of getting solutions deployed. The reduced ETL development time varies by the size and complexity of the project. We probably spend days or weeks less than then if we were using a different tool.

It is tremendously flexible in terms of adding custom code by using a variety of different languages if you want to, but we had relatively few scenarios where we needed it. We do very little custom coding. Because of the tool we're using, it is not critical. We have developed thousands of transformations and jobs in the tool.

What is most valuable?

It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.

Its performance is a pretty close second. It is a pretty highly performant system. Its query performance on large data sets is very good.

What needs improvement?

Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step.

For how long have I used the solution?

We have been using this solution for more than 10 years.

What do I think about the stability of the solution?

Its stability is very good.

What do I think about the scalability of the solution?

Its scalability is very good. We've been running it for a long time, and we've got dozens, if not hundreds, of jobs running a day.

We probably have 200 or 300 people using it across all areas of the business. We have people in production control, finance, and what we call materials management. We have people in manufacturing, procurement, and of course, IT. It is very widely and extensively used. We're increasing its usage all the time.

How are customer service and support?

They are very good at quickly and effectively solving the issues we have brought up. Their support is well structured. They're very responsive.

Because we're very experienced in it, when we come to them with a problem, it is usually something very obscure and not necessarily easy to solve. We've had cases where when we were troubleshooting issues, they applied just a remarkable amount of time and effort to troubleshoot them.

Support seems to have very good access to development and product management as a tier-two. So, it is pretty good. I would give their technical support an eight out of ten.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We didn't have another data integration product before Pentaho.

How was the initial setup?

I installed it. It was straightforward. It took about a day and a half to get the production environment up and running. That was probably because I was e-learning as I was going. With a services engagement, I bet you would have everything up in a day.

What about the implementation team?

We used Pentaho services for two days. Our experience was very good. We worked with Andy Grohe. I don't know if he is still there or not, but he was excellent.

What was our ROI?

We have absolutely seen an ROI, but I don't have the metrics. There are analytic cases that we just weren't able to do before. Due to the relatively low cost compared to some of the other solutions out there, it has been a no-brainer.

What's my experience with pricing, setup cost, and licensing?

We did a two or three-year deal the last time we did it. As compared to other solutions, at least so far in our experience, it has been very affordable. The licensing is by component. So, you need to make sure you only license the components that you really intend to use.

I am not sure if we have relicensed after the Hitachi acquisition, but previously, multi-year renewals resulted in a good discount. I'm not sure if this is still the case.

We've had the full suite for a lot of years, and there is just the initial cost. I am not aware of any additional costs.

What other advice do I have?

If you haven't used it before, it is worth engaging services with Pentaho for initial implementation. They'll just point out a number of small foibles related to perhaps case sensitivity. They'll just save you a lot of runs through the documentation to identify different configuration points that might be relevant to you.

I would highly recommend the Data Integration product, particularly for anyone with a Java background. Most of our BI developers at this point do not have a Java background, which isn't really that important. Particularly, if you're a Java business and you're looking for extensibility, the whole solution is built in Java, which just makes certain aspects of it a little more intuitive at first.

On the data integration side, it is really a good tool. A lot of investment dollars go into big data and new tech, and often, those are not very compelling for us. We're in an environment where we have medium data, not big data.

It provides a single end-to-end data management experience from ingestion to insights, but at this point, that's not critical to us. We mostly do the data integration work in Pentaho, and then we do the visualization in another tool. The single data management experience hasn't enabled us to discontinue the use of other data management analysis delivery tools just because we didn't really have them.

We take an existing job or transformation and use that as a test. It is certainly easy enough to copy one object to another. I am not aware of a specific templating capability, but we are not really missing anything there. It is very easy for us to clone a job or transformation just by doing a Save As, and we do that extensively.

Vantara's roadmap is a little fuzzy for me. There has been quite a bit of turnover in the customer-facing roles over the last five years. We understand that there is a roadmap to move to a pure web-based solution, but it hasn't been well communicated to us.

In terms of our decision to purchase Hitachi's product services or solutions, our satisfaction level is average or on balance.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
CDE & BI Delivery Manager at a tech services company with 501-1,000 employees
Consultant
May 10, 2022
Connects to different databases, origins of data, files, and SFTP
Pros and Cons
  • "I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created."
  • "It is a very good tool if you need to work with data."
  • "I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector."

What is our primary use case?

I just use it as an ETL. It is a tool that helps me work with data so I can solve any of my production problems. I work with a lot of databases. Therefore, I use this tool to keep information organized. 

I work with a virtual private cloud (VPC) and VPN. If I work in the cloud, I use VPC. If I work on-premises, I work with VPNs.

How has it helped my organization?

I can create faster instructions than writing with SQL or code. Also, I am able to do some background control of the data process with this tool. Therefore, I use it as an ELT tool. I have a station area where I can work with all the information that I have in my production databases, then I can work with the data that I created.

Right now, I am working in the business intelligence area. However, we use BI in all our companies. So, it is not only in one area. So, I create different data parts for different business units, e.g., HR, IT, sales, and marketing.

What is most valuable?

A valuable feature is the number of connectors that I have. So, I can connect to different databases, origins of data, files, and SFTP. With SQL and NoSQL databases, I can connect, put it in my instructions, send it to my staging area, and create the format. Thus, I can format all my data in just one process.

What needs improvement?

I work with different databases. I would like to work with more connectors to new databases, e.g., DynamoDB and MariaDB, and new cloud solutions, e.g., AWS, Azure, and GCP. If they had these connectors, that would be great. They could improve by building new connectors. If you have native connections to different databases, then you can make instructions more efficient and in a more natural way. You don't have to write any scripts to use that connector.

Hitachi can make a lot of improvements in the tool, e.g., in performance or latency or putting more emphasis on cloud solutions or NoSQL databases. 

For how long have I used the solution?

I have more than 15 years of experience working with it.

What do I think about the stability of the solution?

The stability depends on the version. At the beginning, it was more focused on stability. As of now, some things have been deprecated. I really don't know why. However, I have been pretty happy with the tool. It is a very good tool. Obviously, there are better tools, but Pentaho is fast and pretty easy to use. 

What do I think about the scalability of the solution?

It is scalable. 

How are customer service and support?

Their support team will receive a ticket on any failures that you might have. We have a log file that lets us review our errors, both in Windows and Unix. So, we are able to check both operating systems.

If you don't pay any license, you are not allowed to use their support at all. While I have used it a couple of times, that was more than 10 years ago. Now, I just go to their community and any Pentaho forums. I don't use the support.

Which solution did I use previously and why did I switch?

I have used a lot of ETL data integrators, such as DataStage, Informatica, Talend, Matillion, Python, and even SQL. MicroStrategy, Qlik, and Tableau have instructional features, and I try to use a lot of tools to do instructions. 

How was the initial setup?

I have built the solution. It does not change for cloud or on-premise developments. 

You create in your development environments, then you move to test. After that, you do the volume and integrity testing, then you go to UAT. Finally, you move to production. It does depend on the customer. You can thoroughly create the entire product structure as well as all the files that you need. Once you put it in production, it should work. You should have the same structure in development, test, and production.

What was our ROI?

It is free. I don't spend money on it.

It will reduce a lot of the time that you work with data.

What's my experience with pricing, setup cost, and licensing?

I use it because it is free. I download from their page for free. I don't have to pay for a license. With other tools, I have to pay for the licenses. That is why I use Pentaho.

I used to work with the complete suite of Pentaho, not only Data Integration. I used to build some solutions from scratch. I used to work with the Community version and Enterprise versions. With the Enterprise version, it is more than building cubes. I am building a BI solution that I can explore. Every time that I use Pentaho Data Integration, I never spend any money because it comes free with the tool. If you pay for the Enterprise license, Pentaho Data Integration is included. If you don't pay for it and use the Community version, Data Integration is included for free. 

Which other solutions did I evaluate?

I used to work with a reseller of Pentaho. That is why I started working with it. Also, I did some training for Pentaho at the company that I used to work for in Argentina, where we were a Platinum reseller. 

Pentaho is easy to use. You don't need to install anything. You can just open the script and start working on it. That is why I chose it. With Informatica, you need to do a server installation, but some companies might not allow some installation in their production or normal environment.

I feel pretty comfortable using the solution. I have tried to use other tools, but I always come back to Pentaho because it is easier. 

Pentaho is open source. While Informatica is a very good tool, it is pretty expensive. That is one of the biggest cons for the data team because you don't want to pay money for tools that just only help you to work.  

What other advice do I have?

I would rate this solution as eight out of 10. One of the best things about the solution is that it is free.

I used to sell Pentaho. It has a lot of pros and cons. From my side, there are more pros than cons. There isn't one tool that can do everything that you need, but this tool is one of those tools that helps you to complete your tasks and it is pretty integrable with other tools. So, you can switch Pentaho on and off from different tools and operating systems. You can use it in Unix, Linux, Windows, and Mac.

If you know how to develop different things and are very good at Java, you can create your own connectors. You can create a lot of things. 

It is a very good tool if you need to work with data. There isn't a database that you can't manage with this tool. You can work with it and manage all the data that you want to manage.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Ryan Ferdon - PeerSpot reviewer
Senior Data Engineer at Burgiss
Real User
Apr 4, 2022
Low-code makes development faster than with Python, but there were caching issues
Pros and Cons
  • "The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming."
  • "With this solution, instead of it taking a week, it was reduced to an afternoon, or about three hours."
  • "If you're working with a larger data set, I'm not so sure it would be the best solution; the larger things got the slower it was and it was kind of buggy sometimes."

What is our primary use case?

We used it for ETL to transform data from flat files, CSV files, and database. We used PostgreSQL for the connections, and then we would either import it into our database if the data was in from clients, or we would export it to files if clients wanted files or if a vendor needed to import the files into their database.

How has it helped my organization?

The biggest benefit is that it's a low-code solution. When you hire junior ETL developers or engineers, who may have a schooling background but no real experience with ETL or coding for ETL, it's a UI-based, low-code solution in which they can make something happen within weeks instead of, potentially, months.

Because it's low-code, while I could technically have done everything in Python alone, that would definitely have taken longer than using Pentaho. In addition, by being able to standardize pipelines to handle the onboarding process for new clients, development costs were significantly reduced. To put in perspective, prior to my leading the effort to standardize things, it would typically take about a week to build a feed from start to finish, and sometimes more depending on how complicated it was. With this solution, instead of it taking a week, it was reduced to an afternoon, or about three hours. That was a significant difference.

Instead of paying a developer a full week's worth of work, which could be $2,500 or more, it cut it down to three hours or about $300. That's a big difference.

What is most valuable?

The fact that it's a low-code solution is valuable. It's good for more junior people who may not be as experienced with programming. In our case, we didn't have a huge data set. We had small and medium-sized data sets, so it worked fine.

The fact that it's open source is also helpful in that, if a junior engineer knows they are going to use it in a job, they can download it themselves, locally, for free, and use test data to learn it.

My role was to use it to write one feed that could facilitate multiple clients. Given that it was an open-source, free solution, it was pretty robust in what it could do. I could make lookup tables and databases and map different clients, and I could use the same feed for 30 clients or 50 clients. It got the job done for our use case.

In addition, you can install it wherever you need it. We had installed versions in the cloud and I also had local versions.

What needs improvement?

If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was.

It was kind of buggy sometimes. And when we ran the flow, it didn't go from a perceived start to end, node by node. Everything kicked off at once. That meant there were times when it would get ahead of itself and a job would fail. That was not because the job was wrong, but because Pentaho decided to go at everything at once, and something would process before it was supposed to. There were nodes you could add to make sure that, before this node kicks off, all these others have processed, but it was a bit tedious. 

There were also caching issues, and we had to write code to clear the cache every time we opened the program, because the cache would fill up and it wouldn't run. I don't know how hard that would be for them to fix, or if it was fixed in version 10.

Also, the UI is a bit outdated, but I'm more of a fan of function over how something looks.

One other thing that would have helped with Pentaho was documentation and support on the internet: how to do things, how to set up. I think there are some sites on how to install it, and Pentaho does have a help repository, but it wasn't always the most useful.

For how long have I used the solution?

I used Hitachi Lumada Data Integration (Pentaho) for three years

What do I think about the stability of the solution?

In terms of the stability of the solution, as I noted, I wouldn't use it for large data sets. But for small to midsize companies that are looking for a low-code solution that isn't going to break the budget, it's a great tool for them to use.

It worked and it was stable enough, once we figured out the little quirks and how to get around them. It mostly handled our production workflows without issue.

What do I think about the scalability of the solution?

I think it could scale, but only up to a point. I didn't test it on larger datasets. But after talking to people who have worked on larger datasets, they wouldn't recommend using it, but that is hearsay.

In my former company, there were about five people in the data engineering department who were using the solution in their roles as ETL data integration Specialists.

In that company, it's their go-to solution and I think it will work for everything that they need. When I was there, I tried opening pathways to different things, but there were so many feeds already on it, and it worked for what they need, and it's low-code and open source, so I think they'll stick with it. As they gain more clients they'll increase their usage of it.

How was the initial setup?

The initial setup wasn't that complicated. You have to set the job environment variables and that was probably the most complicated part, and would be especially so if you're not familiar with it. Otherwise, it was just a matter of downloading the version needed, installing it, and learning how to use the different components. Overall, it was pretty easy and straightforward.

The first time we deployed it, not knowing what we were doing, it took a couple of days, but that was mainly troubleshooting and figuring out what we were doing wrong because we hadn't used it before. After that, it would take maybe 30 minutes or an hour.

In terms of maintenance for Pentaho, one developer per feed is what is typically assigned. It will depend on the workflow of the company and how many feeds are needed. In our case there were five people involved.

What was our ROI?

It saved us a lot of money. Given that it's open source, and the amount of time over the three that I used it, and the fact that they were using it several years prior, means a lot of money was definitely saved by using Pentaho versus something else.

What's my experience with pricing, setup cost, and licensing?

If a company is looking for an ETL solution and wants to integrate it with their tech stack but doesn't want to spend a bunch of money, Pentaho is a good solution. SSIS cores were $10,000 a piece. Although I don't know what they cost nowadays, they're expensive. 

Pentaho is a nice option without having to pay an arm and a leg. We even had a complicated data set and Pentaho was able to handle pretty much every type of scenario, if we thought about it creatively enough. I would recommend it for a company in that position.

Which other solutions did I evaluate?

While the capabilities of Pentaho are good enough for light work, I've started using Alteryx Designer, and it is so much more robust in everything that you can do in real time. I've also used SSIS.

When you run something in Pentaho, you can click on it to see the output of each one, but it's hard to really change anything. For example, if I were to query data from a database and put it into a "select," if I wanted to reorganize within the select based on something like the first initial of someone's name, it provided that option. But when I would do it, sometimes it would throw an error and I'd have to run the feed again to see it.

The nodes, or the components, in Pentaho can probably do about 70 percent of what you can do in Alteryx. Don't get me wrong, Pentaho worked for what we needed it for, with just a few quirks. But as a data engineer, I'm always interested in and excited to work with new technologies that may offer different benefits. In this case, one of the benefits is that each node in Alteryx has many more capabilities in real time. I can look at the data that's coming into the node and the data that's going out. There was a way to do that in Pentaho, if you right-clicked and looked, but it would tell you the fields that were coming in and out and not necessarily the data. It's nice to be able to troubleshoot, on the spot, node-by-node, if you're having an issue. You can do that easily with Alteryx.

In addition to being able to look at data coming in and out of the node, you can also sort it easily and filter it within each data node in Alteryx, and that is something you can't do in Pentaho.

Another cool thing with Alteryx, although it's a very small difference, is that you don't have to save the workflow before you run it. Pentaho forces you to do that. Of course, it's always good to save.

What other advice do I have?

A good thing about Pentaho is that it's not that hard to learn, from an ETL perspective. The way that Pentaho has things laid out they are pretty intuitively organized in the panel: Your input—flat file, CSV, or database—and then the transformation nodes. 

It was a good baseline and a good open-source tool to use to learn ETL. It's good to have exposure to multiple tools because every company has different needs and, depending on their needs, it would be a different recommendation.

The lessons I learned using it: Make sure you clear the cache when you open the program. Also, if there are any critical points in your flow that are dependent upon previous nodes, make sure that you put blocking steps in. Make sure you also set up the job environment variables correctly, so that Pentaho runs.

It worked for what we did but, personally, I wouldn't use it. In the new company I'm working for, we are using large financial data sets and I'm not so sure it could handle that. I know there's an Enterprise version, but I didn't use that.

The solution can handle ingestion through to export, but you still have to have a batch or Python script to run it with an automation process. I don't know if the Lumada version has something different, but with what I was using, you were simply building the pipeline, but the pipeline outside of the program had to be scheduled and run, and we had other tools to check that the output was as expected.

We used version 7 for a while and we were reluctant to upgrade to version 9 because we had an 834 configuration, meaning a government standardized feed that our developer spent two years building. There was an issue whenever we tried to run those feeds on version 9, so we were reluctant to upgrade because things were working on 7. We ended up finding out that it didn't take much work for us to fix the problem that we were having with version 9 and, eventually, we moved to it. With every version upgrade of anything, there are going to be pros and cons.

Depending on what someone needs it for, if it's a small project and they don't want to pay for an enterprise solution, I would recommend it and give it a nine out of 10. The finicky things were a little frustrating, but the fact that it's free, can be deployed easily, and that it can fulfill a lot of things on a small scale, are plusses. If it were for a larger company that needed an enterprise solution, I wouldn't recommend it. In that case, it would be one out of 10.

For a smaller company or one with a smaller budget, a company that doesn't have highly complex ETL needs, Pentaho is definitely a great option. If a company has the budget and has really specific needs and large data sets, I would suggest looking elsewhere.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: June 2026
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.