No more typing reviews! Try our Samantha, our new voice AI agent.
Aqeel UR Rehman - PeerSpot reviewer
BI Analyst at a computer software company with 51-200 employees
Real User
Top 5
May 19, 2022
Simple to use, supports custom transformations, and the open-source version can be used free of charge
Pros and Cons
  • "This solution allows us to create pipelines using a minimal amount of custom coding."
  • "My advice for anybody who is considering this product is if they're looking for any kind of custom transformation, or they're gleaning data from multiple sources and sending it to multiple destinations, I definitely recommend this tool."
  • "I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
  • "The technical support does not reply in a timely manner. The support they have in place does not work very well."

What is our primary use case?

I have used this ETL tool for working with data in projects across several different domains. My use cases include tasks such as transforming data that has been taken from an API like PayPal, extracting data from different sources such as Magenta or other databases, and transforming all of the information.

Once the transformation is complete, we load the data into data warehouses such as Amazon Redshift.

How has it helped my organization?

There are a lot of different benefits we receive from using this solution. For example, we can easily accept data from an API and create JSON files. The integration is also very good.

I have created many data pipelines and after they are created, they can be reused on different levels.

What is most valuable?

The best feature is that it's simple to use. There are simple data transformation steps available, such as trimming data or performing different types of replacement.

This solution allows us to create pipelines using a minimal amount of custom coding. Anyone in the company can do so, and it's just a simple step. If any coding is required then we can use JavaScript.

What needs improvement?

I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors. If there is a large amount of data then there is definitely a lag.

I would like to see a cloud-based deployment because it will allow us to easily handle a large amount of data.

Buyer's Guide
Pentaho Data Integration and Analytics
April 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2026.
893,244 professionals have used our research since 2012.

For how long have I used the solution?

I have been working with Hitachi Lumada Data Integration for almost three years, across two different organizations.

What do I think about the stability of the solution?

There is definitely some lag but with a little improvement, it will be a good fit.

What do I think about the scalability of the solution?

This is a good product for an enterprise-level company.

We use this solution for all of our data integration jobs. It handles the transformation. As our business grows and the demand for data integration increases, our usage of this tool will also increase.

Between versions, they have added a lot of plugins.

How are customer service and support?

The technical support does not reply in a timely manner. I have filled out the support request form, one or two times, asking about different things, but I have not received a reply.

The support they have in place does not work very well. I would rate them one or two out of ten.

Which solution did I use previously and why did I switch?

In this business, they initially began with this product and did not use another one beforehand. I have also worked on the cloud-level integration tool. 

How was the initial setup?

The initial setup and deployment are straightforward.

I have deployed it on different servers and on average, it takes an hour to complete. I have not read any documentation regarding installation. With my experience, we were able to set everything up.

What's my experience with pricing, setup cost, and licensing?

I primarily work on the Community Version, which is available to use free of charge. I have asked for pricing information but have not yet received a response.

What other advice do I have?

We are currently using version 8.3 but version 9 is available. More features to support big data are available in the newest release.

My advice for anybody who is considering this product is if they're looking for any kind of custom transformation, or they're gleaning data from multiple sources and sending it to multiple destinations, I definitely recommend this tool.

Overall, this is a good product and I recommend it.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Data Architect at a tech services company with 1,001-5,000 employees
Reseller
May 12, 2022
Helped us to fully digitalize a national census process, eliminating door-to-door interviews
Pros and Cons
  • "One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
  • "As a result of one of the projects that we did in the Middle East, we achieved the main goal of fully digitalizing their population census."
  • "I would like to see support for some additional cloud sources. It doesn't support Azure, for example. I was trying to do a PoC with Azure the other day but it seems they don't support it."
  • "The support from Hitachi is not the greatest, the fixing of bugs can take a really long time."

What is our primary use case?

We use it as an ETL tool. We take data from a source database and move it into a target database. We do some additional processing on our target databases as well, and then load the data into a data warehouse for reports. The end result is a data warehouse and the reports built on top of that.

We are a consulting company and we implement it for clients.

How has it helped my organization?

As a result of one of the projects that we did in the Middle East, we achieved the main goal of fully digitalizing their population census. They did previous censuses doing door-to-door surveys, but for the last census, using Pentaho Data Integration, we managed to get it all running in a fully digital way, with nothing on paper forms. No one had to go door-to-door and survey the people.

What is most valuable?

One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.

What needs improvement?

I would like to see support for some additional cloud sources - Azure, Snowflake.

For how long have I used the solution?

I have been using Hitachi Lumada Data Integration for four years.

What do I think about the stability of the solution?

There have been some bugs and some weird things every now and then, but it is mostly fairly stable.

What do I think about the scalability of the solution?

If you work with relatively small data sets, it's all okay. But if you are going to use really huge data sets, then you might get into a bit of trouble, at least from what I have seen.

How are customer service and support?

The support from Hitachi is not the greatest, the fixing of bugs can take a really long time.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup is very straightforward compared to many other ETL tools. It takes about half a day.

We have about five users, altogether. There are two to three developers, one to two customer people who run the ETLs and one to two admins who take care of the environment itself. It doesn't require much maintenance. Occasionally someone has to restart the server or take a look at logs.

What was our ROI?

Because you can basically get Pentaho Data Integration for free, I would give the cost versus performance a pretty good rating.

Taking the census project that I mentioned earlier as an example (Pentaho Data Integration was not the only contributor though, it was just one part of the whole solution) the statistical authority managed to save huge amounts of money by making the census electronic, versus the traditional version.

What's my experience with pricing, setup cost, and licensing?

You don't need the Enterprise Edition, you can go with the Community Edition. That way you can use it for free and, for free, it's a pretty good tool to use. 

If you pay for licenses, the only thing that you're getting, in addition, is customer support, which is pretty much nonexistent in any case. I would recommend going with the Community Edition.

Which other solutions did I evaluate?

I have had experience with other solutions, but for the last project we did not evaluate other options. Because we had previous experience with Pentaho Data Integration, it was pretty much a no-brainer to use it.

What other advice do I have?

Hitachi Vantara's roadmap is promising. They came up with Lumada and it seems that they do have some ideas on how to make their product a bit more attractive than it currently is.

I'm fairly satisfied with using Pentaho Data Integration. It's more or less okay. When it comes to all the other parts, like Pentaho reports and Pentaho dashboards, things could be better there.

The biggest lesson I've learned from using this solution is that a cheap, open-source tool can sometimes be even more efficient than some of the high-priced enterprise ETL tools. Overall, the solution is okay, considering the low cost. It has all of the main things that you would expect it to have, from a technical perspective.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Buyer's Guide
Pentaho Data Integration and Analytics
April 2026
Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2026.
893,244 professionals have used our research since 2012.
Renan Guedert - PeerSpot reviewer
Business Intelligence Specialist at a recruiting/HR firm with 11-50 employees
Real User
Apr 20, 2022
Creates a good, visual pipeline that is easy to understand, but doesn't handle big data well
Pros and Cons
  • "Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
  • "Sometimes, it took a whole team about two weeks to get all the data to prepare and present it, and after the optimization of the data, it took about one to two hours to do the whole process."
  • "A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."
  • "A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review, so if you need to do a review, you have to take pictures of the screen to show each step."

What is our primary use case?

It was our principle to make the whole ETL and data warehousing on our projects. We created a whole step for collecting all the raw data from APIs and other databases from flat files, like Excel files, CSV files, and JSON files, to do the whole transformation and data preparation, then model the data and put it in SQL Server and integration services.

For business intelligence projects, it is sometimes pretty good, when you are extracting something from the API, to have a step to transform the JSON file from the API to an SQL table.

We use it heavily as a virtual machine running on Windows. We have also installed the open-source version on the desktop.

How has it helped my organization?

Lumada provides us with a single, end-to-end data management experience from ingestion to insights. This single data management experience is pretty good because then you don't have every analyst doing their own stuff. When you have one unique tool to do that, you can keep improving as well as have good practices and a solid process to do the projects.

What is most valuable?

It has many resourceful things. It has a variety of the things that you can do. It is also pretty open, since you can put in a Python script or JavaScript for everything. If you don't have the native tool on the application, you can build your own using scripts. You can build your other steps and jobs on the application. The liberty of the application has been pretty good.

Lumada enables us to create pipelines with minimal manual coding efforts, which is the most important thing. When creating a pipeline, you can see which steps are failing in the process. You can keep up the process and debug, if you have problems. So, it creates a good, visual pipeline that makes it easy to understand what you are doing during the entire process.

What needs improvement?

There is no straight-line explanation about bugs and errors that happen on the software. I must search heavily on the Internet, some YouTube videos, and other forums to know what is happening. The proper site of Hitachi and Lumada doesn't have the best explanation about bugs, errors, and the functions. I must search for other sources to understand what is happening. Usually, it is some guy in India or Russia who knows the answer.

A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git.

After you create a data pipeline, if you could make a JSON file or something with another language, we could simplify the steps for creating what we are doing. Or, a simple flat file of text could be even better than that, but generated by their own platform so people can look and see what is happening. You shouldn't need to download the whole project in your own Pentaho, I would like to just look at the code and see if there is something wrong.

When I use it for open-source applications, it doesn't handle big data too well. Therefore, we have to use other kinds of technologies to manage that.

I would like it more accessible for Macs. Previously, I always used Linux, but some companies that I worked for before used MacBooks. It would be good if I could use Pentaho in that too since I need to use other tools or create a virtual machine to use Pentaho. So, it would be pretty good if the solution had a friendly version for Macs or Linux-based programs, like Ubuntu.

For how long have I used the solution?

I have been using it for six years, but more heavily over the last two years.

How are customer service and support?

I don't bring issues to Hitachi since Lumada is open source in some kind of way. 

Once, when I had a problem with connections because of the software, I saw the issue in the forums on the Internet because there was some type of bug happening.

Which solution did I use previously and why did I switch?

At my first company, we used just Lumada. At my second company, we used a lot of QlikView, SQL, Python, and Lumada. At my third company, we used Python and SQL much more. I used Lumada just once at that company. At my new company, I don't use it at all. I just use Azure Data Factory and SQL.

With Pentaho, we finally have data pipelines. We didn't have solid data pipelines before. After the data pipelines became very solid, the team who created them became very popular in the company.

How was the initial setup?

To set up the things, we used a virtual machine. It was a version where we can download it and unlock a machine too. You can do Ctrl-C and Ctrl-V with Pentaho because all you need to have is the newest version of Java. So, it was pretty smooth to do the setup. It took an hour maximum to deploy.

What was our ROI?

Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side.

The solution reduced our ETL development time by a lot because a whole project used to take about a month to get done previously. After having Lumada, it took just a week. For a big company in Brazil, it saves a team at least $10,000 a month.

Which other solutions did I evaluate?

I just use the ETL tool. For data visualization, we are using Power BI. For data storage, we use SQL Server, Azure, or Google BigQuery.

We are just using the open-source application for ETL. We have never looked into other tools of Hitachi because they are paid.

I know other companies who are using Alteryx, which has a friendlier user interface, but they have fewer tools and are more difficult to utilize. My wife uses Alteryx, and I find it is not as good after I used Lumada because they have more solutions and it's open-source. Though, Alteryx has more security and better support.

What other advice do I have?

For someone who wants simple solutions, open-source tools are very perfect for someone who isn't a programmer or knowledgeable about technology. In one week, you can try to understand this solution and do your first project. In my opinion, it is the best tool for people starting out.

Lumada is a great tool. I would rate it as a straight seven out of 10. It gets the work done. The open-source version doesn't work well with big data sources, but there is a lot of flexibility and liberty to do everything you want and need. If the open-source version worked better with big data, then I would give it a straight eight since there is always room for improvement. Sometimes when debugging, some errors can be pretty difficult. It is a tool in principle, when you are starting business intelligence and data engineering, to understand everything that is going on.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
José Orlando Maia - PeerSpot reviewer
Data Engineer at a tech vendor with 1,001-5,000 employees
MSP
Apr 20, 2022
We can parallelize the extraction from various servers simultaneously, accelerating our extraction
Pros and Cons
  • "The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
  • "Using Lumada compared to using SQL manually, ETL development time is half the time it took using a basic manual transformation."
  • "Lumada could have more native connectors with other vendors, such as Google BigQuery, Microsoft OneDrive, Jira systems, and Facebook or Instagram. We would like to gather data from modern platforms using Lumada, which is a better approach. As a comparison, if you open Power BI to retrieve data, then you can get data from many vendors with cloud-native connectors, such as Azure, AWS, Google BigQuery, and Athena Redshift. Lumada should have more native connectors to help us and facilitate our job in gathering information from these new modern infrastructures and tools."
  • "Lumada could have more native connectors with other vendors, such as Google BigQuery, Microsoft OneDrive, Jira systems, and Facebook or Instagram."

What is our primary use case?

My primary use case is to provide integration with my source systems, such as ERP systems and SAP systems, and web-based systems, having them primarily integrate with my data warehouse. For this process, I use ETL to treat and gather all the information from my first system, then consolidate it in my data warehouse.

How has it helped my organization?

We needed to gather data from many servers at my company. We had probably 10 or 12 equivalent databases spread around the world, i.e., Brazil, Paraguay, or Chile, and had an instance in each country. So, this server is Microsoft SQL Server-based. We are using Lumada to get the data from these international databases. We can parallelize the extraction from various servers at the same time because we have the same structure, schemas, and tables in each of these SQL Server-based servers. This provides a good value for us, as we can extract data at the same time in parallel, which accelerates our extraction.

In one integration process, I can retrieve data from 10 or 12 servers at the same time in one transformation. In the past, using SQL Server or other manual tools, we needed to have 10 or 12 different processes, one per server. Using Lumada in parallel accelerates our extraction. The tools that Lumada provides enable us to transform the data during this process, integrating the data in our data warehouse with good performance. 

Because Lumada uses Java virtual machines, we can deploy and operate in whatever operational system that we want. We can deploy on Linux, even when we had a Linux version from Lumada and a Windows version from Lumada.

It is simple to deploy my ETLs because Lumada has the Pentaho Server version. I installed the desktop version so we can deploy our transformations in the repository. We install our own Lumada on a server, then we have a web interface to schedule our ETLs. We are also able to reschedule our ETLs. We can schedule the hour that we want to run our ETL processes and transformations. We can schedule how many times we want to process the data. We can save all our transformations in a repository located in a Pentaho Server. Since we have a repository, we can save many versions of our transformation, such as 1.0, 1.1, and 1.2, in the repository. I can save four or five versions of a transformation. I can ask Lumada to run only the last version that I saved in the database. 

Lumada offers a web interface to follow these transformations. We can check the logs to see if the transformations were successfully completed, we had a network query, or some database log issues. Using Lumada, there is a feature where we can get logs at the execution time. We can also be notified by email if transformations occurred successfully or failed. We have a file for each process that we schedule on Pentaho Server.

The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data.

What is most valuable?

The features that I use the most are Microsoft Excel table input, S3 CSV Input, and CSV input. Today, the features that are more valuable to me are the table input, then the CSV input. These both are very important. We extract data from the table system for our transactional databases, which are commonly used. We also use the CSV input to get data from AWS S3 and our data lake.

In Lumada, we can parallelize the steps. The performance to query the databases for me is good, especially for transactional databases. Because Lumada uses Java, we can adjust the amount of memory that we want to use to do transformations. So, it is accessible. It's possible to set up the amount of memory that we want to use in the Java VM, which is good. Therefore, Lumada is good, especially with transactional database extraction. It has good performance, not higher performance, but good performance as we query data, and it is possible to parallelize the query. For example, if we have three or four servers to get the data, then we can retrieve the data at the same time, in parallel, in these databases. This is good because we don't need to wait while one of the extractions finishes. 

Using Lumada, we don't need to do many manual transformations because we have a native company for many of our transformations. Thus, Lumada is a low-code tool to gather data from SQL, Python, or other transformation tools.

What needs improvement?

Lumada could have more native connectors with other vendors, such as Google BigQuery, Microsoft OneDrive, Jira systems, and Facebook or Instagram. We would like to gather data from modern platforms using Lumada, which is a better approach. As a comparison, if you open Power BI to retrieve data, then you can get data from many vendors with cloud-native connectors, such as Azure, AWS, Google BigQuery, and Athena Redshift. Lumada should have more native connectors to help us and facilitate our job in gathering information from these new modern infrastructures and tools.

For how long have I used the solution?

I have been using Lumada Data Integration for at least four years. I started using it in 2018.

How are customer service and support?

Because we are using the free version of Lumada, we have used only the support on the communities and forums on the Internet. 

Lumada does have a paid version, where Hitachi support is specialized in Lumada support. 

How was the initial setup?

It is simple to deploy Lumada because we can deploy our transformation in three to five simple steps, saving our transformation in a repository. 

I open the Pentaho Server web-based version, then I find the transformation that I deployed. I can schedule this transformation at the hour or recurrence in which I want to run the transformation. It is easy. Because at the end of the process, I can save my transformation and Lumada generates the XML file. We can send this XML file to any user of Lumada, who can open up this model and get the transformation that I developed. As a deployment process, it is straightforward, simple, and not complex.

What was our ROI?

Using Lumada compared to using SQL manually, ETL development time is half the time it took using a basic manual transformation.

What's my experience with pricing, setup cost, and licensing?

There are more types of connectors, but you need to pay. 

You need to go through the paid version to have Hitachi Lumada specialized support. However, if you are using the free version, then you will have only the community support. You will depend on the releases from Hitachi to solve some problem or questions that you have, such as bug fixes. You will need to wait for the newest versions or releases to solve these types of problems.

Which other solutions did I evaluate?

I also use Talend Data Integration. For me, Lumada is straightforward and makes it simpler to have transformations as drag and drops. Comparing Talend and Lumada, I think Lumada is easier to use, more than Talend. The comprehension needed for these tools is less with Lumada with than Talend. I can learn Lumada in a day and proceed with my transformations, using some tutorials, since Lumada is easier to use. Whereas, Talend is a more complex solution with more complex transformations.

In Talend's open version, i.e., free version, you won't have a Talend server to deploy models. Thus, you deploy Talend models on the server. If you want to schedule some transformation, then you need to use the operational system where you have infrastructure to run transformations and deploy them. For example, in Talend, we deployed a data model in Talend, but we needed to use Windows Scheduler to also schedule the packets in Talend to process the data in the free version of Talend. Whereas, in the free version of Lumada, we already had it based on the web server. Therefore, we can run our transformations and deploy them on the server. We can schedule in a web interface, which guides us with scheduling the data and checking our logs to see how many transformations we have at a time. This is the biggest difference between Talend and Lumada.

What other advice do I have?

I don't use many templates. I use the solution based on a case-by-case basis.

Considering that Lumada is a free tool, I would rate it as nine out of 10 for the free version.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Jacopo Zaccariotto - PeerSpot reviewer
Head of Data Engineering at InfoCert
Real User
Apr 20, 2022
The drag-and-drop interface makes it easier to use than some competing products
Pros and Cons
  • "We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
  • "We've seen a 50 percent reduction in our ETL development time using the free version of Pentaho, saving about 1,000 euros per week and at least 50,000 euros annually."
  • "The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."
  • "The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting."

What is our primary use case?

We use Pentaho for small ETL integration jobs and cross-storage analytics. It's nothing too major. We have it deployed on-premise, and we are still on the free version of the product.

In our case, processing takes place on the virtual machine where we installed Pentaho. We can ingest data from different on-premises and cloud locations. We still don't carry out the data processing phase inside a different environment from where the VM is running.

How has it helped my organization?

At the start of my team's journey at the company, it was difficult to do cross-platform storage analytics. That means ingesting data from different analytics sources inside a single storage machine and building out KPIs and some other analytics. 

Pentaho was a good start because we can create different connections and import data. We can then do some global queries on that data from various sources. We've been able to replace some of our other data tools like Talend for our managing data warehouse workflow. Later, we adopted some other cloud technologies, so we don't primarily use Pentaho for those use cases anymore. 

What is most valuable?

Pentaho is flexible with a drag-and-drop interface that makes it easier to use than some other ETL products. For example, the full stack we are using in AWS does not have drag-and-drop functionality. Pentaho was a good option at the start of this journey.

We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.

What needs improvement?

It's difficult to use custom code. Implementing a pipeline with pre-built blocks is straightforward, but it's harder to insert custom code inside the pre-built blocks. The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode.

Repository management is also a shortcoming, but I'm not sure if that's just a limitation of the free version. I'm not sure if Pentaho can use an external repository. It's a flat-file repository inside a virtual machine. Back in the day, we would want to deploy this repository on a database.

Pentaho's data management covers ingestion and insights but I'm not sure if it's end-to-end management—at least not in the free version we are using—because some of the intermediate steps are missing, like data cataloging and data governance features. This is the weak spot of our Pentaho version.

For how long have I used the solution?

We implemented Hitachi Pentaho some time ago. We have been using it for around five or six years. I was using the product at the time, but now I am the head of the data engineering team, so I don't use it anymore but I know Pentaho's strengths and weaknesses.

What do I think about the stability of the solution?

Pentaho is relatively stable, but I average about one failed job every month. 

What do I think about the scalability of the solution?

I rate Pentaho six out of 10 for scalability. The scalability depends on how you deploy it. In our case, the on-premise virtual machine is relatively small and doesn't have a lot of resources. That is why Pentaho does not handle big datasets well in our case. 

I'm also unsure if we can deploy Pentaho in the cloud. So when you're not dealing with the cloud, scalability is always limited. We cannot indefinitely pump resources into a virtual machine.

Currently, we have five or six active workflows running each night. Some of them are ingesting data from ADU. Others take data from AWS Redshift or on-premise Oracle. In terms of people, three other people on the data engineering team and I are actively using Pentaho.

Which solution did I use previously and why did I switch?

We used Talend, which is a Java-based solution and is made for people with proficiency in Java. The entire analytics ecosystem is transitioning to more flexible runtimes, including Python and other languages. Java was not ideal for our data analytics journey.

Right now, we are using NiFi, a tool in the cloud ecosystem that has a similar drag-and-drop interface, but it's embedded in the ADU framework. We're also using another drag-and-drop tool on AWS, but not AWS Glue Studio. 

What was our ROI?

We've seen a 50 percent reduction in our ETL development time using the free version of Pentaho. That saves about 1,000 euros per week, so at least 50,000 euros annually. 

What other advice do I have?

I rate Pentaho eight out of 10. It's a perfect pick for data teams that are getting started and more business-oriented data teams. It's good for a data analyst who isn't so tech-savvy. It is flexible and easy to use. 

Disclosure: My company does not have a business relationship with this vendor other than being a customer.
PeerSpot user
Dale Bloom - PeerSpot reviewer
Credit Risk Analytics Manager at MarketAxess
Real User
Jan 30, 2022
Integrates easily, significantly reduces our development time, and allows us to put as much code as we want
Pros and Cons
  • "I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics."
  • "When we get a question from our CEO that needs a response and that requires a little bit of legwork of pulling in from various market data, our own in-house repositories, and everything else, it allows me to arrive at the solutions much faster than having to do it through scripting in Python, coding, or anything else."
  • "In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version."
  • "In the Community edition, it would be nice to have more modules that allow you to code directly within the application."

What is our primary use case?

The use case is for data ETL on our various data repositories. We use it to aggregate and transform data for visualization purposes for our upper management.

Currently, I am using the PDI locally on my laptop, but we are undergoing an integration to push this off. We have purchased the Enterprise edition and have licenses, and we are just working with our infrastructure to get that set up on a server. 

We haven't yet launched the Enterprise edition, so I've had very minimal touch with Lumada, but I did have an overview with one of the engineers as to how to use the customer portal in terms of learning documentation. So, the documentation and support are basically the two main areas that I've been using it for. I haven't piped any data or anything through it. I've logged in a couple of times to the customer portal, and I've pretty much been using it as support functionality. I have been submitting requests to understand more about how to get everything to be working for the Enterprise edition. So, I have been using the Lumada customer portal mostly for Pentaho Data Integration.

How has it helped my organization?

When we get a question from our CEO that needs a response and that requires a little bit of legwork of pulling in from various market data, our own in-house repositories, and everything else, it allows me to arrive at the solutions much faster than having to do it through scripting in Python, coding, or anything else. I use multiple tools within my toolkit. I'm pretty heavy on Python, but I find that I can do quite a bit of pre-transformation of the data within the actual application for PDI Spoon than having to do everything through coding in Python.

It has significantly reduced our ETL development time. I can't really quantify the hours, but it's a no-brainer for me for just pumping in things. If I have a simple question to ascertain, I can pull up and create any type of job or transform to easily get the solution within minutes, as opposed to however many hours of coding it would take. My estimate is that per week, I would be spending about 75% of my time in coding external to the application, whereas, with the application itself, I can do things within a fraction of that. So, it has reduced my time from 75% to about 5%. In terms of the cost of full-time employee coding and everything, the savings would also roughly be the same, which is from 75% to 5% per week. There is also a broader impact on other colleagues within my team. Currently, their processes are fairly manual, such as Excel-based, so the time savings are carried over to them as well.

What is most valuable?

I'm at the early stages with Lumada, and I have been using the documentation quite a bit. The support has definitely been critical right now in terms of trying to find out more about the architectural elements that need to go in for pushing the Enterprise edition.

I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics. 

I also appreciate the fact that it's not one of the low-code/no-code solutions. You can put as much JavaScript or another code into it as you want, and that makes it a really powerful tool.

What needs improvement?

I haven't been able to broach all the functionality of the Enterprise edition because it hasn't been integrated into our server. We're still building out the server, app server, and repository to support it.

In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version.

For how long have I used the solution?

I have been using it here for about two months. 

What do I think about the stability of the solution?

I haven't had any problems with stability. Right now, for the implementation of the Enterprise edition, we're trying to make sure that it's highly available in case anything goes down, and we have proper safety nets in place, but personally, I haven't found any issues.

What do I think about the scalability of the solution?

It seems highly scalable. I've used the product in other firms, and we've managed to work pretty coherently pushing our changes for code, revisions, and everything else to Git and things like that.

In terms of users, currently, in my firm, I'm the only user, but the intention is to push it globally for all of our users to be able to use it. 

We would like to be able to support other teams and other departments within the organization. Currently, this is being used only for our credit risk team, but in general, within risk, we have many departments such as operational risk, enterprise risk, market risk, and credit risk. I'm bridging all of them right now. However, with other teams that have expressed an interest, it also will include our settlements team and potentially even our research team and FP&A.

How are customer service and support?

So far, it's been pretty good. I would rate them an eight out of 10. 

People are fairly responsive initially to saying, "Okay, yes, we have this on our radar. Coming back." Sometimes, it might take a little bit longer for some responses, but it's still very good, and the quality is a 10 out of 10.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

At my current firm, we weren't using anything in this team. I just came in, and I knew I wanted to use this product. I had used it quite heavily at my previous firm, and it was just very easy. Even the folks who did not have prior coding experience or data ETL experience could fairly quickly learn its semantics or the ways to work with it. So, I figured that it would be a great product to push forward.

Other teams in my firm were using low-code or no-code solutions, but I just can't stand their interfaces. It's rather limited in terms of even viewing what's on the screen and what you have. I appreciate the way you can debug very quickly within PDI.

How was the initial setup?

It was pretty straightforward for me. I had no problem with configuring it. For my personal use of the product, it took an hour of my time to get it onto my machine. For the Enterprise edition, the deployment is still going on, but it's mainly because we don't have many people on our infrastructure team to help. They have multiple ongoing projects. 

The implementation strategy for my personal use case was fairly straightforward. It involved getting the Community edition and configuring it so that I can set up the pipelines for connecting to my data sources and databases and then output to a file share drive for now. All our databases are fairly read-only on our side. In terms of the implementation strategy for the Enterprise edition, we haven't gotten to the stage of completing it, but it'll work somewhat similarly. It's just that the repositories, instead of them being folder repositories, are going to be database-driven, and any code is going to be pushed to the database repository.

What about the implementation team?

We are not using any integrator or consultant for this. For its deployment and maintenance, we're rather limited in terms of the staff. We have one infrastructure person and me. I'm going to be in charge of maintaining it for the time being until I can increase my team.

What was our ROI?

When you can get things done much faster and free up people's time, it's a no-brainer.

When I came into the firm, I was using the Community edition, which is the freeware version. Because the Enterprise edition costs something, it has actually increased our costs, but as a whole, in terms of operational ability and time savings for the rest of my team, the output from PDI and everything else has only increased the value of using this product.

What's my experience with pricing, setup cost, and licensing?

The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good.

At this time, there are no additional costs. We just have the licensing fees.

What other advice do I have?

If you don't have the comfort level for the architectural build-out, then you can definitely opt for the white gloves treatment with an additional cost of about 50,000 to help with the integration and implementation effort of it. We chose not to go that route. Therefore, we're using support for any of the fine-tuning questions about making it highly available and other things.

I have not used Lumada for creating pipelines. I'm using PDI to help with our data pipelines. Similarly, I am not using its ability to develop and deploy data pipeline templates at this time, and I also haven't used it for single end-to-end data management from ingestion to insight.

The biggest lesson that I have learned from using this solution is that the order of operations is critical. Other than that, it has been an absolute treat to use.

I've been espousing this product to everybody. I would rate it a 10 out of 10.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Lead, Data and BI Architect at a financial services firm with 201-500 employees
Real User
Jan 13, 2022
We can use the same tool on all our environments. The patching is buggy.
Pros and Cons
  • "Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us."
  • "I love the fact that we haven't come up with a problem yet that we haven't been able to address with this tool."
  • "The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi."
  • "The stability is not great, especially when you start patching it a lot because things get broken."

What is our primary use case?

We run the payment systems for Canada. We use it as a typical ETL tool to transfer and modify data into a data warehouse. We have many different pipelines that we have built with it.

How has it helped my organization?

I love the fact that we haven't come up with a problem yet that we haven't been able to address with this tool. I really appreciate its maturity and the breadth of its capabilities.

If we did not have this tool, we would probably have to use a whole different variety of tools, then our environment would be a lot more complicated.

We develop metadata pipelines and use them.

Flexible deployment, in any environment, is very important to us. That is the key reason why we ended up with these tools. Because we have a very highly secure environment, we must be able to install it in multiple environments on multiple different servers. The fact that we could use the same tool in all our environments, on-prem and in the cloud, was very important to us. 

What is most valuable?

Because it comes from an open-source background, it has so many different plugins. It is just extremely broad in what it can do. I appreciate that it has a very broad, wide spectrum of things that it can connect to and do. It has been around for a while, so it is mature and has a lot of things built into it. That is the biggest thing. 

The visual nature of its development is a big plus. You don't need to have very strong developers to be able to work with it.

We often have to drop down to JavaScript, but that is fine. I appreciate that it has the capability built-in. When you need to, you can drop down to a scripting language. This is important to us.

What needs improvement?

The documentation is very basic.

The testing and quality could really improve. Every time that there is a major release, we are very nervous about what is going to get broken. We have had a lot of experience with that, as even the latest one was broken. Some basic things get broken. That doesn't look good for Hitachi at all. If there is one place I would advise them to spend some money and do some effort, it is with the quality. It is not that hard to start putting in some unit tests so basic things don't get broken when they do a new release. That just looks horrible, especially for an organization like Hitachi.

For how long have I used the solution?

Overall, I have been using it for about 10 years. At my current organization, I have been using it for about seven years. It was used a little bit at my previous organization as well.

What do I think about the stability of the solution?

The stability is not great, especially when you start patching it a lot because things get broken. That is not a great look. When you start patching, you are expecting things to get fixed, not new things to get broken.

With modern programming, you build a lot of automated testing around your solution, and it is specifically for that. I changed this piece of code. Well, what else got broken? Obviously they don't have a lot of unit tests built into their code. They need to start doing that because it looks horrible when they change one thing, then two other things get broken. Then, they released that as a commercial product, which is horrible. Last time, somehow they broke the ability to connect with databases. That is something incredibly basic. How could you release this product without even testing for that?

What do I think about the scalability of the solution?

We don't have a huge amount of data, so I can't really answer how we could scale up to very large solutions.

How are customer service and support?

Lumada’s ability to quickly and effectively solve issues we have brought up is not great. We have a service for the solution with Hitachi. I don't get the sense that Pentaho, and Hitachi still calls it Pentaho, is a huge center of focus for them. 

You kind of get help, but the people from whom you get help aren't necessarily super strong. It often goes around in circles forever. I eventually have to find my own solution. 

I haven't found that the Hitachi support site has a depth of understanding for the solution. They can answer simple questions, but when it gets more in-depth, they have a lot of trouble answering questions. I don't think the support people have the depth of expertise to really deal with difficult questions.

I would rate them as five out of 10. They are responsive and polite. I don't feel ignored or anything like that, just the depth of knowledge isn't there.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

It has always been here. There was no solution like it until I got to the company.

How was the initial setup?

The initial setup was complex because we had to integrate with SAML. Even though they had some direction on that, it was really a do-it-yourself kind of thing. That was pretty complicated, so if they want to keep this product fresh, I think they have to work on making it integrate more with modern technology, like single sign-on and stuff like that. Every organization has that now and Pentaho doesn't have a good story for that. However, it is the platform that they don't give a lot of love to.

It took us a long time to figure it out, something like two weeks.

What was our ROI?

This has reduced our ETL development time. If it wasn't for this solution, we would be doing custom coding. The reason why we are using the solution is because of its simplicity of development.

What's my experience with pricing, setup cost, and licensing?

The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that.

Which other solutions did I evaluate?

Apache has a project going on called Apache Hop. Because Pentaho was open sourced, people have taken and forged it. They are really modernizing the solution. As far as I know, Hitachi is not involved yet. I would highly advise them to get involved in that open-source project. It will be the next generation of Pentaho. If they get left behind, they're not going to have anything. It would be a very bad move to just ignore it. Hitachi should not ignore Apache Hop.

What other advice do I have?

I really like the data integration tool. However, it is part of a whole platform of tools, and it is obvious the other tools just don't get a lot of love. We are in it for Pentaho Data Integration (PDI) because that is what we want as our ETL tool. We use their reporting platform and stuff like that, but it is obvious that they just don't get a lot of love or concern.

I haven't looked at the roadmap that much. We are also a Google customer using BigQuery, etc. Hitachi is really just a very niche part of what we do. Therefore, we are not generally looking very seriously at what Hitachi is doing with their products nor a big investor in what Hitachi is doing.

I would recommend this specific Hitachi product to a friend or colleague, depending on their use case and need. If they have a very similar need, I would recommend it. I wouldn't be saying, "Oh, this is the best thing next to sliced bread," but say, "Hey, if this is what you need, this works well for us."

On a scale of one to 10 for recommending the product, I would rate it as seven out of 10. Overall, I would also rate it as seven out of 10.

We really appreciated the breadth of its capabilities. It is not the top-of-the-line solution, but you really get a lot for what you pay for.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Google
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
reviewer1751571 - PeerSpot reviewer
Systems Analyst at a university with 5,001-10,000 employees
Real User
Jan 3, 2022
Reuse of ETLs with metadata injection saves us development time, but the reporting side needs notable work
Pros and Cons
  • "The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."
  • "Lumada Data Integration definitely helps with decision-making for our deans and upper executives, and the fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs while making it a pretty quick process for our developers to learn and pick up ETLs from each other."
  • "The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."
  • "The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have."

What is our primary use case?

We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from.

We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

How has it helped my organization?

Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.

And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs. 

What is most valuable?

The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.

Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.

In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.

Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.

What needs improvement?

The transition to the web-based solution has taken a little longer and been more tedious than we would like and it's taken away development efforts towards the reporting side of the tool. They have a reporting tool called Pentaho Business Analytics that does all the report creation based on the data integration tool. There are a lot of features in that product that are missing because they've allocated a lot of their resources to fixing the data integration, to make it more web-based. We would like them to focus more on the user interface for the reporting.

The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet. We have between 500 and 800 reports in our system now. We've had to maintain an external spreadsheet with IDs to identify the location of all of those reports, instead of having that built into the system. It's been frustrating for us that they can't just build a simple search feature into the product to search for report names. It needs to be more in line with other reporting tools, like Tableau. Tableau has a lot more features and functions.

Because the reporting is lacking, only the deans and above are using it. It could be used more, and we'd like it to be used more.

Also, while the solution provides us with a single, end-to-end data management experience from ingestion to insights, it does but it doesn't give us a full history of where it's coming from. If we change a field, we can't trace it through from the reporting to the ETL field. Unfortunately, it's a manual process for us. Hitachi has a new product to do that and it searches all the fields, documents, and files just to get your pipeline mapped, but we haven't bought that product yet.

For how long have I used the solution?

I've been using Lumada Data Integration since version 4.2. We're now on version 9.1.

What do I think about the stability of the solution?

The stability has been great. Other than for upgrades, it has been pretty stable.

What do I think about the scalability of the solution?

The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.

For maintenance, surprisingly, it's just me who does so in our organization.

How are customer service and support?

The only issue that we've had is that it takes a little longer than we would like for support to resolve something, although things do eventually get incorporated. They're very quick to respond to an issue, but the fixing of the issue is not as quick.

For example, a few versions ago, when we upgraded it, we found that the upgrade caused a whole bunch of issues with the Oracle data types and the way the ETL was working with them. It wasn't transforming to the data types properly, the way we were expecting it to. In the previous version that we were using it was working fine, but the upgrade caused the issue, and it took them a while to fix that.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We didn't have another tool. This is the only tool we have used to create the data warehouse between the two systems. When we started looking at solutions, this one was great because it was open source and Java-based, and it had a Community Edition. But we actually purchased the Enterprise Edition.

How was the initial setup?

I came in after it was purchased and after the first deployment.

What's my experience with pricing, setup cost, and licensing?

We renew our license every two years. When I spoke to the project manager, he indicated that the pricing has been going up every two years. It's going to reach a point where, eventually, we're going to have to look at alternative solutions because of the price.

When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho. When they bought it, the price shot up. They said the increase is because of all the improvements they put into the product and the support that they're providing. From our point of view, their improvements are mostly on the data integration part of it, instead of the reporting part of it, and we aren't particularly happy with that.

Which other solutions did I evaluate?

I've used Tableau and other reporting tools, but Tableau sticks out because the reporting tool is much nicer. Tableau has its drawbacks with the ETL, because you can only use Tableau datasets. You have to get data into a Tableau file dataset and then the ETL part of it is stuck in Tableau forever.

If we could use the Pentaho ETL and the Tableau reporting we'd be happy campers.

What other advice do I have?

It's a great product. The ETL part of the product is really easy to pick up and use. It has a graphical interface with the ability to be more complex via scripting and features that you can add.

When looking at Hitachi Vantara's roadmap, the ability to upgrade more easily is one element of it that is important to us. Also, they're going more towards web-based solutions, instead of having local client development tools. If it does go on the web, and it works the same way it works on the client, that would be a nice feature. Currently, because we have these local client development tools, we have to have a VM client for our developers to use, and that makes it a little more tricky. Whereas if they put it on the web, then all our developers would be able to use any desktop and access the web for development.

When it comes to the query performance of the solution on large datasets, we haven't had any issues with it. We have one table in our data warehouse that has about 120 million rows and we haven't had any performance issues.

The solution gives you the flexibility to deploy it in any environment, whether on-prem or in the cloud. With our particular implementation, we've done a lot of customizations. We have special things that we bolted onto the product, so it's not as easy to put it onto the cloud for us. All of our customizations and bolt-ons end up costing us more because they make upgrades more difficult and time-consuming. We don't use an automated upgrade process. It's manual. We have to do a full reinstall and then apply all our bolt-ons and make sure it still works. If we could automate that process it would certainly reduce our costs.

In terms of updating to version 9.2, which is the latest version, we're going to look into it next year and see what level of effort is required and determine how it impacts our current system. They release a new update about every six months, and there is a major release every year or two, so it's quite a fast schedule for updates.

Overall, I would rate our satisfaction with our decision to purchase Hitachi products as a seven out of 10. I would definitely recommend the data integration tool but I wouldn't recommend the reporting tool.

Which deployment model are you using for this solution?

On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2026
Product Categories
Data Integration
Buyer's Guide
Download our free Pentaho Data Integration and Analytics Report and get advice and tips from experienced pros sharing their opinions.