Pentaho Data Integration and Analytics Reviews and Pricing

Jacopo Zaccariotto

Head of Data Engineering at InfoCert

Apr 20, 2022

Download

The drag-and-drop interface makes it easier to use than some competing products

Pros and Cons

"We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."

"The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."

What is our primary use case?

We use Pentaho for small ETL integration jobs and cross-storage analytics. It's nothing too major. We have it deployed on-premise, and we are still on the free version of the product.

In our case, processing takes place on the virtual machine where we installed Pentaho. We can ingest data from different on-premises and cloud locations. We still don't carry out the data processing phase inside a different environment from where the VM is running.

How has it helped my organization?

At the start of my team's journey at the company, it was difficult to do cross-platform storage analytics. That means ingesting data from different analytics sources inside a single storage machine and building out KPIs and some other analytics.

Pentaho was a good start because we can create different connections and import data. We can then do some global queries on that data from various sources. We've been able to replace some of our other data tools like Talend for our managing data warehouse workflow. Later, we adopted some other cloud technologies, so we don't primarily use Pentaho for those use cases anymore.

What is most valuable?

Pentaho is flexible with a drag-and-drop interface that makes it easier to use than some other ETL products. For example, the full stack we are using in AWS does not have drag-and-drop functionality. Pentaho was a good option at the start of this journey.

We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.

What needs improvement?

It's difficult to use custom code. Implementing a pipeline with pre-built blocks is straightforward, but it's harder to insert custom code inside the pre-built blocks. The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode.

Repository management is also a shortcoming, but I'm not sure if that's just a limitation of the free version. I'm not sure if Pentaho can use an external repository. It's a flat-file repository inside a virtual machine. Back in the day, we would want to deploy this repository on a database.

Pentaho's data management covers ingestion and insights but I'm not sure if it's end-to-end management—at least not in the free version we are using—because some of the intermediate steps are missing, like data cataloging and data governance features. This is the weak spot of our Pentaho version.

Buyer's Guide

Pentaho Data Integration and Analytics

September 2025

Free Report: Pentaho Data Integration and Analytics Reviews and More

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.

DOWNLOAD NOW

868,759 professionals have used our research since 2012.

For how long have I used the solution?

We implemented Hitachi Pentaho some time ago. We have been using it for around five or six years. I was using the product at the time, but now I am the head of the data engineering team, so I don't use it anymore but I know Pentaho's strengths and weaknesses.

What do I think about the stability of the solution?

Pentaho is relatively stable, but I average about one failed job every month.

What do I think about the scalability of the solution?

I rate Pentaho six out of 10 for scalability. The scalability depends on how you deploy it. In our case, the on-premise virtual machine is relatively small and doesn't have a lot of resources. That is why Pentaho does not handle big datasets well in our case.

I'm also unsure if we can deploy Pentaho in the cloud. So when you're not dealing with the cloud, scalability is always limited. We cannot indefinitely pump resources into a virtual machine.

Currently, we have five or six active workflows running each night. Some of them are ingesting data from ADU. Others take data from AWS Redshift or on-premise Oracle. In terms of people, three other people on the data engineering team and I are actively using Pentaho.

Which solution did I use previously and why did I switch?

We used Talend, which is a Java-based solution and is made for people with proficiency in Java. The entire analytics ecosystem is transitioning to more flexible runtimes, including Python and other languages. Java was not ideal for our data analytics journey.

Right now, we are using NiFi, a tool in the cloud ecosystem that has a similar drag-and-drop interface, but it's embedded in the ADU framework. We're also using another drag-and-drop tool on AWS, but not AWS Glue Studio.

What was our ROI?

We've seen a 50 percent reduction in our ETL development time using the free version of Pentaho. That saves about 1,000 euros per week, so at least 50,000 euros annually.

What other advice do I have?

I rate Pentaho eight out of 10. It's a perfect pick for data teams that are getting started and more business-oriented data teams. It's good for a data analyst who isn't so tech-savvy. It is flexible and easy to use.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Dale Bloom

Credit Risk Analytics Manager at MarketAxess

Jan 30, 2022

Download

Integrates easily, significantly reduces our development time, and allows us to put as much code as we want

Pros and Cons

"I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics."

"In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version."

What is our primary use case?

The use case is for data ETL on our various data repositories. We use it to aggregate and transform data for visualization purposes for our upper management.

Currently, I am using the PDI locally on my laptop, but we are undergoing an integration to push this off. We have purchased the Enterprise edition and have licenses, and we are just working with our infrastructure to get that set up on a server.

We haven't yet launched the Enterprise edition, so I've had very minimal touch with Lumada, but I did have an overview with one of the engineers as to how to use the customer portal in terms of learning documentation. So, the documentation and support are basically the two main areas that I've been using it for. I haven't piped any data or anything through it. I've logged in a couple of times to the customer portal, and I've pretty much been using it as support functionality. I have been submitting requests to understand more about how to get everything to be working for the Enterprise edition. So, I have been using the Lumada customer portal mostly for Pentaho Data Integration.

How has it helped my organization?

When we get a question from our CEO that needs a response and that requires a little bit of legwork of pulling in from various market data, our own in-house repositories, and everything else, it allows me to arrive at the solutions much faster than having to do it through scripting in Python, coding, or anything else. I use multiple tools within my toolkit. I'm pretty heavy on Python, but I find that I can do quite a bit of pre-transformation of the data within the actual application for PDI Spoon than having to do everything through coding in Python.

It has significantly reduced our ETL development time. I can't really quantify the hours, but it's a no-brainer for me for just pumping in things. If I have a simple question to ascertain, I can pull up and create any type of job or transform to easily get the solution within minutes, as opposed to however many hours of coding it would take. My estimate is that per week, I would be spending about 75% of my time in coding external to the application, whereas, with the application itself, I can do things within a fraction of that. So, it has reduced my time from 75% to about 5%. In terms of the cost of full-time employee coding and everything, the savings would also roughly be the same, which is from 75% to 5% per week. There is also a broader impact on other colleagues within my team. Currently, their processes are fairly manual, such as Excel-based, so the time savings are carried over to them as well.

What is most valuable?

I'm at the early stages with Lumada, and I have been using the documentation quite a bit. The support has definitely been critical right now in terms of trying to find out more about the architectural elements that need to go in for pushing the Enterprise edition.

I absolutely love Hitachi. I'm one of the forefront supporters of Hitachi for my firm. It's so easy to integrate within our environments. In terms of being able to quickly build ETL jobs, transform, and then automate them, it's really easy to integrate throughout for data analytics.

I also appreciate the fact that it's not one of the low-code/no-code solutions. You can put as much JavaScript or another code into it as you want, and that makes it a really powerful tool.

What needs improvement?

I haven't been able to broach all the functionality of the Enterprise edition because it hasn't been integrated into our server. We're still building out the server, app server, and repository to support it.

In the Community edition, it would be nice to have more modules that allow you to code directly within the application. It could have R or Python completely integrated into it, but this could also be because I'm using an older version.

For how long have I used the solution?

I have been using it here for about two months.

What do I think about the stability of the solution?

I haven't had any problems with stability. Right now, for the implementation of the Enterprise edition, we're trying to make sure that it's highly available in case anything goes down, and we have proper safety nets in place, but personally, I haven't found any issues.

What do I think about the scalability of the solution?

It seems highly scalable. I've used the product in other firms, and we've managed to work pretty coherently pushing our changes for code, revisions, and everything else to Git and things like that.

In terms of users, currently, in my firm, I'm the only user, but the intention is to push it globally for all of our users to be able to use it.

We would like to be able to support other teams and other departments within the organization. Currently, this is being used only for our credit risk team, but in general, within risk, we have many departments such as operational risk, enterprise risk, market risk, and credit risk. I'm bridging all of them right now. However, with other teams that have expressed an interest, it also will include our settlements team and potentially even our research team and FP&A.

How are customer service and support?

So far, it's been pretty good. I would rate them an eight out of 10.

People are fairly responsive initially to saying, "Okay, yes, we have this on our radar. Coming back." Sometimes, it might take a little bit longer for some responses, but it's still very good, and the quality is a 10 out of 10.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

At my current firm, we weren't using anything in this team. I just came in, and I knew I wanted to use this product. I had used it quite heavily at my previous firm, and it was just very easy. Even the folks who did not have prior coding experience or data ETL experience could fairly quickly learn its semantics or the ways to work with it. So, I figured that it would be a great product to push forward.

Other teams in my firm were using low-code or no-code solutions, but I just can't stand their interfaces. It's rather limited in terms of even viewing what's on the screen and what you have. I appreciate the way you can debug very quickly within PDI.

How was the initial setup?

It was pretty straightforward for me. I had no problem with configuring it. For my personal use of the product, it took an hour of my time to get it onto my machine. For the Enterprise edition, the deployment is still going on, but it's mainly because we don't have many people on our infrastructure team to help. They have multiple ongoing projects.

The implementation strategy for my personal use case was fairly straightforward. It involved getting the Community edition and configuring it so that I can set up the pipelines for connecting to my data sources and databases and then output to a file share drive for now. All our databases are fairly read-only on our side. In terms of the implementation strategy for the Enterprise edition, we haven't gotten to the stage of completing it, but it'll work somewhat similarly. It's just that the repositories, instead of them being folder repositories, are going to be database-driven, and any code is going to be pushed to the database repository.

What about the implementation team?

We are not using any integrator or consultant for this. For its deployment and maintenance, we're rather limited in terms of the staff. We have one infrastructure person and me. I'm going to be in charge of maintaining it for the time being until I can increase my team.

What was our ROI?

When you can get things done much faster and free up people's time, it's a no-brainer.

When I came into the firm, I was using the Community edition, which is the freeware version. Because the Enterprise edition costs something, it has actually increased our costs, but as a whole, in terms of operational ability and time savings for the rest of my team, the output from PDI and everything else has only increased the value of using this product.

What's my experience with pricing, setup cost, and licensing?

The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good.

At this time, there are no additional costs. We just have the licensing fees.

What other advice do I have?

If you don't have the comfort level for the architectural build-out, then you can definitely opt for the white gloves treatment with an additional cost of about 50,000 to help with the integration and implementation effort of it. We chose not to go that route. Therefore, we're using support for any of the fine-tuning questions about making it highly available and other things.

I have not used Lumada for creating pipelines. I'm using PDI to help with our data pipelines. Similarly, I am not using its ability to develop and deploy data pipeline templates at this time, and I also haven't used it for single end-to-end data management from ingestion to insight.

The biggest lesson that I have learned from using this solution is that the order of operations is critical. Other than that, it has been an absolute treat to use.

I've been espousing this product to everybody. I would rate it a 10 out of 10.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.

Buyer's Guide

Pentaho Data Integration and Analytics

September 2025

Free Report: Pentaho Data Integration and Analytics Reviews and More

Learn what your peers think about Pentaho Data Integration and Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: September 2025.

DOWNLOAD NOW

868,759 professionals have used our research since 2012.

RicardoDíaz

COO / CTO at a tech services company with 11-50 employees

Jun 8, 2022

Download

We can create pipelines with minimal manual or custom coding, and we can quickly implement what we need with its drag-and-drop interface

Pros and Cons

"Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that."

"In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud."

What is our primary use case?

We are a service delivery enterprise, and we have different use cases. We deliver solutions to other enterprises, such as banks. One of the use cases is for real-time analytics of the data we work with. We take CDC data from Oracle Database, and in real-time, we generate a product offer for all the products of a client. All this is in real-time. The client could be at the ATM or maybe at an agency, and they can access the product offer.

We also use Pentaho within our organization to integrate all the documents and Excel spreadsheets from our consultants and have a dashboard for different hours for different projects.

In terms of version, currently, Pentaho Data Integration is on version 9, but we are using version 8.2. We have all the versions, but we work with the most stable one.

In terms of deployment, we have two different types of deployments. We have on-prem and private cloud deployments.

How has it helped my organization?

I work with a lot of data. We have about 50 terabytes of information, and working with Pentaho Data Integration along with other databases is very fast.

Previously, I had three people to collect all the data and integrate all Excel spreadsheets. To give me a dashboard with the information that I need, it took them a day or two. Now, I can do this work in about 15 minutes.

It enables us to create pipelines with minimal manual coding or custom coding efforts, which is one of its best features. Pentaho is one of the few tools with which you can do anything you can imagine. Our business is changing all the time, and it is best for our business if I can use less time to develop new pipelines.

It provides the ability to develop and deploy data pipeline templates once and reuse them. I use them at least once a day. It makes my daily life easier when it comes to data pipelines.

Previously, I have used other tools such as Integration Services from Microsoft, Data Services for SAP, and Informatica. Pentaho reduces the ETL implementation time by 5% to 50%.

What is most valuable?

Pentaho from Hitachi is a suite of different tools. Pentaho Data Integration is a part of the suite, and I love the drag-and-drop functionality. It is the best.

Its drag-and-drop interface lets me and my team implement all the solutions that we need in our company very quickly. It's a very good tool for that.

What needs improvement?

Their client support is very bad. It should be improved. There is also not much information on Hitachi forums or Hitachi web pages. It is very complicated.

In terms of the flexibility to deploy in any environment, such as on-premise or in the cloud, we can do the cloud deployment only through virtual machines. We might also be able to work on different environments through Docker or Kubernetes, but we don't have an Azure app or an AWS app for easy deployment to the cloud. We can only do it through virtual machines, which is a problem, but we can manage it. We also work with Databricks because it works with Spark. We can work with clustered servers, and we can easily do the deployment in the cloud. With a right-click, we can deploy Databricks through the app on AWS or Azure cloud.

For how long have I used the solution?

I have been using Pentaho Data Integration for 12 years. The first version that I tested and used was 3.2 in 2010.

How are customer service and support?

Their technical support is not good. I would rate them 2 out of 10 because they don't have good technical skills to solve problems.

How would you rate customer service and support?

Negative

How was the initial setup?

It is very quick and simple. It takes about five minutes.

What other advice do I have?

I have a good knowledge of this solution, and I would highly recommend it to a friend or colleague.

It provides a single, end-to-end data management experience from ingestion to insights, but we have to create different pipelines to generate the metadata management. It's a little bit laborious to work with Pentaho, but we can do that.

I've heard a lot of people say it's complicated to use, but Pentaho is one of the few tools where you can do anything you can imagine. It is very good and quite simple, but you need to have the right knowledge and the right people to handle the tool. The skills needed to create a business intelligence solution or a data integration solution with Pentaho are problem-solving logic and maybe database knowledge. You can develop new steps, and you can develop new functionality in Pentaho Lumada, but you must have the knowledge of advanced Java programming. Our experience, in general, is very good.

Overall, I am satisfied with our decision to purchase Hitachi's product services and solutions. My satisfaction level is at an eight out of ten.

I am not much aware of the roadmap of Hitachi Vantara. I don't read much about that.

I would rate this solution an eight out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer. Partner

reviewer995501455

Solution Integration Consultant II at a tech vendor with 201-500 employees

Jun 8, 2022

Download

Reduces the effort required to build sophisticated ETLs

Pros and Cons

"We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines."

"It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively."

What is our primary use case?

My work primarily revolves around data migration and data integration for different products. I have used them in different companies, but for most of our use cases, we use it to integrate all the data that needs to flow into our product. Also, we can have outbound from our product when we need to send to different, various integration points. We use this product extensively to build ETLs for those use cases.

We are developing ETLs for the inbound data into the product as well as outbound to various integration points. Also, we have a number of core ETLs written on this platform to enhance our product.

We have two different modes that we offer: one is on-premises and the other is on the cloud. On the cloud, we have an EC2 instance on AWS, then we have installed that EC2 instance and we call it using the ETL server. We also have another server for the application where the product is installed.

We use version 8.3 in the production environment, but in the dev environment, we use version 9 and onwards.

How has it helped my organization?

We have been able to reduce the effort required to build sophisticated ETLs. Also, we now are in the migration phase from an on-prem product to a cloud-native application.

We use Lumada’s ability to develop and deploy data pipeline templates once and reuse them. This is very important. When the entire pipeline is automated, we do not have any issues in respect to deployment of code or with code working in one environment but not working in another environment. We have saved a lot of time and effort from that perspective because it is easy to build ETL pipelines.

What is most valuable?

The metadata injection feature is the most valuable because we have used it extensively to build frameworks, where we have used it to dynamically generate code based on different configurations. If you want to make a change at all, you do not need to touch the actual code. You just need to make some configuration changes and the framework will dynamically generate code for that as per your configuration.

We have a UI where we can create our ETL pipelines as needed, which is a key advantage for us. This is very important because it reduces the time to develop for a given project. When you need to build the whole thing using code, you need to do multiple rounds of testing. Therefore, it helps us to save some effort on the QA side.

Hitachi Vantara's roadmap has a pretty good list of features that they have been releasing with every new version. For instance, in version 9, they have included metadata injection for some of the steps. The most important elements of this roadmap to our organization’s strategy are the data-driven approach that this product is taking and the fact that we have a very low-code platform. Combining these two is what gives us the flexibility to utilize this software to enhance our product.

What needs improvement?

It could be better integrated with programming languages, like Python and R. Right now, if I want to run a Python code on one of my ETLs, it is a bit difficult to do. It would be great if we have some modules where we could code directly in a Python language. We don't really have a way to run Python code natively.

For how long have I used the solution?

I have been working with this tool for five to six years.

What do I think about the stability of the solution?

They are making it a lot more stable. Earlier, stability used to be an issue when it was not with Hitachi. Now, we don't see those kinds of issues or bugs within the platform because it has become far more stable. Also, we see a lot of new big data features, such as connecting to the cloud.

What do I think about the scalability of the solution?

Lumada is flexible to deploy in any environment, whether on-premises or the cloud, which is very important. When we are processing data in batches on certain days, e.g., at the end of the week or month, we might have more data and need more processing power or RAM. However, most times, there might be very minimal usage of that CPU power. In that way, the solution has helped us to dynamically scale up, then scale down when we see that we have more data that we need to process.

The scalability is another key advantage of this product versus some of the others in the market since we can tweak and modify a number of parameters. We are really impressed with the scalability.

We have close to 80 people who are using this product actively. Their roles go all the way from junior developers to support engineers. We also have people who have very little coding knowledge and are more into the management side of things utilizing this tool.

How are customer service and support?

I haven't been part of any technical support discussions with Hitachi.

Which solution did I use previously and why did I switch?

We are very satisfied with our decision to purchase Hitachi's product. Previously, we were using another ETL service that had a number of limitations. It was not a modern ETL service at all. For anything, we had to rely on another third-party software. Then, with Hitachi Lumada, we don't have to do that. In that way, we are really satisfied with the orchestration or cloud-native steps that they offer. We are really happy on those fronts.

We were using something called Actian Services, which had less features and it ended up costing more than the enterprise edition of Pentaho.

We could not do a number of things on Actian. For instance, we were unable to call other APIs or connect to an S3 bucket. It was not a very modern solution. Whereas, with Pentaho, we could do all these things as well as have great marketplaces where we could find various modules and third-party plugins. Those features were simply not there in the other tool.

How was the initial setup?

The initial setup was pretty straightforward.

What about the implementation team?

We did not have any issues configuring it, even in my local machine. For the enterprise edition, we have a separate infrastructure team doing that. However, for at least the community edition, the deployment is pretty straightforward.

What was our ROI?

We have seen at least 30% savings in terms of effort. That has helped us to price our service and products more aggressively in the market, helping us to win more clients.

It has reduced our ETL development time. Per project, it has reduced by around 30% to 35%.

We can price more aggressively. We were actually able to win projects because we had great reusability of ETLs. A code that was used for one client can be reused with very minimal changes. We didn't have any upfront cost for kick-starting projects using the Community edition. It is only the Enterprise edition that has a cost.

What's my experience with pricing, setup cost, and licensing?

For most development tasks, the Enterprise edition should be sufficient. It depends on the type of support that you require for your production environment.

Which other solutions did I evaluate?

We did evaluate SSIS since our database is based on Microsoft SQL server. SSIS comes with any purchase of an SQL Server license. However, even with SSIS, there were some limitations. For example, if you want to build a package and reuse it, SSIS doesn't provide the same kinds of abilities that Pentaho does. The amount of reusability reduces when we try to build the same thing using SSIS. Whereas, in Pentaho, we could literally reuse the same code by using some of its features.

SSIS comes with the SQL Server and is easier to maintain, given that there are far more people who would have knowledge of SSIS. However, if I want to do a PCP encryption or make an API connection, it is difficult. To create a reusable package is not that easy, which would be the con for SSIS.

What other advice do I have?

The query performance depends on the database. It is more likely to be good if you have a good database server with all the indexes and bells and whistles of a database. However, from a data integration tool perspective, I am not seeing any issues with respect to query performance.

We do not build visualization features that much with Hitachi. For the reporting purposes, we have been using one of the tools from the product, then prepare the data accordingly.

We use this for all the projects that we are currently running. Going forward, we will be sticking only to using this ETL tool.

We haven't had any roadblocks using Lumada Data Integration.

On a scale of one to 10, I would recommend Hitachi Vantara to a friend or colleague as a nine.

If you need to build ETLs quickly in a low-code environment, where you don't want to spend a lot of time on the development side of things but it is a little difficult to find resources, then train them in this product. It is always worth that effort because it ends up saving a lot of time and resources on the development side of projects.

Overall, I would rate the product as a nine out of 10.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

José Orlando Maia

Data Engineer at a tech vendor with 1,001-5,000 employees

Apr 20, 2022

Download

We can parallelize the extraction from various servers simultaneously, accelerating our extraction

Pros and Cons

"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."

"Lumada could have more native connectors with other vendors, such as Google BigQuery, Microsoft OneDrive, Jira systems, and Facebook or Instagram. We would like to gather data from modern platforms using Lumada, which is a better approach. As a comparison, if you open Power BI to retrieve data, then you can get data from many vendors with cloud-native connectors, such as Azure, AWS, Google BigQuery, and Athena Redshift. Lumada should have more native connectors to help us and facilitate our job in gathering information from these new modern infrastructures and tools."

What is our primary use case?

My primary use case is to provide integration with my source systems, such as ERP systems and SAP systems, and web-based systems, having them primarily integrate with my data warehouse. For this process, I use ETL to treat and gather all the information from my first system, then consolidate it in my data warehouse.

How has it helped my organization?

We needed to gather data from many servers at my company. We had probably 10 or 12 equivalent databases spread around the world, i.e., Brazil, Paraguay, or Chile, and had an instance in each country. So, this server is Microsoft SQL Server-based. We are using Lumada to get the data from these international databases. We can parallelize the extraction from various servers at the same time because we have the same structure, schemas, and tables in each of these SQL Server-based servers. This provides a good value for us, as we can extract data at the same time in parallel, which accelerates our extraction.

In one integration process, I can retrieve data from 10 or 12 servers at the same time in one transformation. In the past, using SQL Server or other manual tools, we needed to have 10 or 12 different processes, one per server. Using Lumada in parallel accelerates our extraction. The tools that Lumada provides enable us to transform the data during this process, integrating the data in our data warehouse with good performance.

Because Lumada uses Java virtual machines, we can deploy and operate in whatever operational system that we want. We can deploy on Linux, even when we had a Linux version from Lumada and a Windows version from Lumada.

It is simple to deploy my ETLs because Lumada has the Pentaho Server version. I installed the desktop version so we can deploy our transformations in the repository. We install our own Lumada on a server, then we have a web interface to schedule our ETLs. We are also able to reschedule our ETLs. We can schedule the hour that we want to run our ETL processes and transformations. We can schedule how many times we want to process the data. We can save all our transformations in a repository located in a Pentaho Server. Since we have a repository, we can save many versions of our transformation, such as 1.0, 1.1, and 1.2, in the repository. I can save four or five versions of a transformation. I can ask Lumada to run only the last version that I saved in the database.

Lumada offers a web interface to follow these transformations. We can check the logs to see if the transformations were successfully completed, we had a network query, or some database log issues. Using Lumada, there is a feature where we can get logs at the execution time. We can also be notified by email if transformations occurred successfully or failed. We have a file for each process that we schedule on Pentaho Server.

The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data.

What is most valuable?

The features that I use the most are Microsoft Excel table input, S3 CSV Input, and CSV input. Today, the features that are more valuable to me are the table input, then the CSV input. These both are very important. We extract data from the table system for our transactional databases, which are commonly used. We also use the CSV input to get data from AWS S3 and our data lake.

In Lumada, we can parallelize the steps. The performance to query the databases for me is good, especially for transactional databases. Because Lumada uses Java, we can adjust the amount of memory that we want to use to do transformations. So, it is accessible. It's possible to set up the amount of memory that we want to use in the Java VM, which is good. Therefore, Lumada is good, especially with transactional database extraction. It has good performance, not higher performance, but good performance as we query data, and it is possible to parallelize the query. For example, if we have three or four servers to get the data, then we can retrieve the data at the same time, in parallel, in these databases. This is good because we don't need to wait while one of the extractions finishes.

Using Lumada, we don't need to do many manual transformations because we have a native company for many of our transformations. Thus, Lumada is a low-code tool to gather data from SQL, Python, or other transformation tools.

What needs improvement?

Lumada could have more native connectors with other vendors, such as Google BigQuery, Microsoft OneDrive, Jira systems, and Facebook or Instagram. We would like to gather data from modern platforms using Lumada, which is a better approach. As a comparison, if you open Power BI to retrieve data, then you can get data from many vendors with cloud-native connectors, such as Azure, AWS, Google BigQuery, and Athena Redshift. Lumada should have more native connectors to help us and facilitate our job in gathering information from these new modern infrastructures and tools.

For how long have I used the solution?

I have been using Lumada Data Integration for at least four years. I started using it in 2018.

How are customer service and support?

Because we are using the free version of Lumada, we have used only the support on the communities and forums on the Internet.

Lumada does have a paid version, where Hitachi support is specialized in Lumada support.

How was the initial setup?

It is simple to deploy Lumada because we can deploy our transformation in three to five simple steps, saving our transformation in a repository.

I open the Pentaho Server web-based version, then I find the transformation that I deployed. I can schedule this transformation at the hour or recurrence in which I want to run the transformation. It is easy. Because at the end of the process, I can save my transformation and Lumada generates the XML file. We can send this XML file to any user of Lumada, who can open up this model and get the transformation that I developed. As a deployment process, it is straightforward, simple, and not complex.

What was our ROI?

Using Lumada compared to using SQL manually, ETL development time is half the time it took using a basic manual transformation.

What's my experience with pricing, setup cost, and licensing?

There are more types of connectors, but you need to pay.

You need to go through the paid version to have Hitachi Lumada specialized support. However, if you are using the free version, then you will have only the community support. You will depend on the releases from Hitachi to solve some problem or questions that you have, such as bug fixes. You will need to wait for the newest versions or releases to solve these types of problems.

Which other solutions did I evaluate?

I also use Talend Data Integration. For me, Lumada is straightforward and makes it simpler to have transformations as drag and drops. Comparing Talend and Lumada, I think Lumada is easier to use, more than Talend. The comprehension needed for these tools is less with Lumada with than Talend. I can learn Lumada in a day and proceed with my transformations, using some tutorials, since Lumada is easier to use. Whereas, Talend is a more complex solution with more complex transformations.

In Talend's open version, i.e., free version, you won't have a Talend server to deploy models. Thus, you deploy Talend models on the server. If you want to schedule some transformation, then you need to use the operational system where you have infrastructure to run transformations and deploy them. For example, in Talend, we deployed a data model in Talend, but we needed to use Windows Scheduler to also schedule the packets in Talend to process the data in the free version of Talend. Whereas, in the free version of Lumada, we already had it based on the web server. Therefore, we can run our transformations and deploy them on the server. We can schedule in a web interface, which guides us with scheduling the data and checking our logs to see how many transformations we have at a time. This is the biggest difference between Talend and Lumada.

What other advice do I have?

I don't use many templates. I use the solution based on a case-by-case basis.

Considering that Lumada is a free tool, I would rate it as nine out of 10 for the free version.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

reviewer1751571

Systems Analyst at a university with 5,001-10,000 employees

Jan 3, 2022

Download

Reuse of ETLs with metadata injection saves us development time, but the reporting side needs notable work

Pros and Cons

"The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs."

"The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet."

What is our primary use case?

We use it as a data warehouse between our HR system and our student system, because we don't have an application that sits in between them. It's a data warehouse that we do our reporting from.

We also have integrations to other, isolated apps within the university that we gather data from. We use it to bring that into our data warehouse as well.

How has it helped my organization?

Lumada Data Integration definitely helps with decision-making for our deans and upper executives. They are the ones who use the product the most to make their decisions. The data warehouse is the only source of information that's available for them to use, and to create that data warehouse we had to use this product.

And it has absolutely reduced our ETL development time. The fact that we're able to reuse some of the ETLs with the metadata injection saves us time and costs. It also makes it a pretty quick process for our developers to learn and pick up ETLs from each other. It's definitely easy for us to transition ETLs from one developer to another. The ETL functionality satisfies 95 percent of all our needs.

What is most valuable?

The ETL is definitely an awesome feature of the product. It's very easy and quick to use. Once you understand the way it works it's pretty robust.

Lumada Data Integration requires minimal coding. You can do more complex coding if you want to, because it has a scripts option that you can add as a feature, but we haven't found a need to do that yet. We just use what's available, the steps that they have, and that is sufficient for our needs at this point. It makes it easier for other developers to look at the things that we have developed and to understand them quicker, whereas if you have complex coding it's harder to hand off to other people. Being able to transition something to another developer, and having that person pick it up quicker than if there were custom scripting, is an advantage.

In addition, the solution's ability to quickly and effectively solve issues we've brought up has been great. We've been able to use all the available features.

Among them is the ability to develop and deploy data pipeline templates once and reuse them. The fact that it enables us to leverage metadata to automate data pipeline templates and reuse them is definitely one of the features that we like the best. The metadata injection is helpful because it reduces the need to create and maintain additional ETLs. If we didn't have that feature, we would have lots of duplicated ETLs that we would have to create and maintain. The data pipeline templates have definitely been helpful when looking at productivity and costs. The automation of data pipeline templates has also been helpful in scaling the onboarding of data.

What needs improvement?

The transition to the web-based solution has taken a little longer and been more tedious than we would like and it's taken away development efforts towards the reporting side of the tool. They have a reporting tool called Pentaho Business Analytics that does all the report creation based on the data integration tool. There are a lot of features in that product that are missing because they've allocated a lot of their resources to fixing the data integration, to make it more web-based. We would like them to focus more on the user interface for the reporting.

The reporting definitely needs improvement. There are a lot of general, basic features that it doesn't have. A simple feature you would expect a reporting tool to have is the ability to search the repository for a report. It doesn't even have that capability. That's been a feature that we've been asking for since the beginning and it hasn't been implemented yet. We have between 500 and 800 reports in our system now. We've had to maintain an external spreadsheet with IDs to identify the location of all of those reports, instead of having that built into the system. It's been frustrating for us that they can't just build a simple search feature into the product to search for report names. It needs to be more in line with other reporting tools, like Tableau. Tableau has a lot more features and functions.

Because the reporting is lacking, only the deans and above are using it. It could be used more, and we'd like it to be used more.

Also, while the solution provides us with a single, end-to-end data management experience from ingestion to insights, it does but it doesn't give us a full history of where it's coming from. If we change a field, we can't trace it through from the reporting to the ETL field. Unfortunately, it's a manual process for us. Hitachi has a new product to do that and it searches all the fields, documents, and files just to get your pipeline mapped, but we haven't bought that product yet.

For how long have I used the solution?

I've been using Lumada Data Integration since version 4.2. We're now on version 9.1.

What do I think about the stability of the solution?

The stability has been great. Other than for upgrades, it has been pretty stable.

What do I think about the scalability of the solution?

The scalability is great too. We've been able to expand the current system and add a lot of customizations to it.

For maintenance, surprisingly, it's just me who does so in our organization.

How are customer service and support?

The only issue that we've had is that it takes a little longer than we would like for support to resolve something, although things do eventually get incorporated. They're very quick to respond to an issue, but the fixing of the issue is not as quick.

For example, a few versions ago, when we upgraded it, we found that the upgrade caused a whole bunch of issues with the Oracle data types and the way the ETL was working with them. It wasn't transforming to the data types properly, the way we were expecting it to. In the previous version that we were using it was working fine, but the upgrade caused the issue, and it took them a while to fix that.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We didn't have another tool. This is the only tool we have used to create the data warehouse between the two systems. When we started looking at solutions, this one was great because it was open source and Java-based, and it had a Community Edition. But we actually purchased the Enterprise Edition.

How was the initial setup?

I came in after it was purchased and after the first deployment.

What's my experience with pricing, setup cost, and licensing?

We renew our license every two years. When I spoke to the project manager, he indicated that the pricing has been going up every two years. It's going to reach a point where, eventually, we're going to have to look at alternative solutions because of the price.

When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho. When they bought it, the price shot up. They said the increase is because of all the improvements they put into the product and the support that they're providing. From our point of view, their improvements are mostly on the data integration part of it, instead of the reporting part of it, and we aren't particularly happy with that.

Which other solutions did I evaluate?

I've used Tableau and other reporting tools, but Tableau sticks out because the reporting tool is much nicer. Tableau has its drawbacks with the ETL, because you can only use Tableau datasets. You have to get data into a Tableau file dataset and then the ETL part of it is stuck in Tableau forever.

If we could use the Pentaho ETL and the Tableau reporting we'd be happy campers.

What other advice do I have?

It's a great product. The ETL part of the product is really easy to pick up and use. It has a graphical interface with the ability to be more complex via scripting and features that you can add.

When looking at Hitachi Vantara's roadmap, the ability to upgrade more easily is one element of it that is important to us. Also, they're going more towards web-based solutions, instead of having local client development tools. If it does go on the web, and it works the same way it works on the client, that would be a nice feature. Currently, because we have these local client development tools, we have to have a VM client for our developers to use, and that makes it a little more tricky. Whereas if they put it on the web, then all our developers would be able to use any desktop and access the web for development.

When it comes to the query performance of the solution on large datasets, we haven't had any issues with it. We have one table in our data warehouse that has about 120 million rows and we haven't had any performance issues.

The solution gives you the flexibility to deploy it in any environment, whether on-prem or in the cloud. With our particular implementation, we've done a lot of customizations. We have special things that we bolted onto the product, so it's not as easy to put it onto the cloud for us. All of our customizations and bolt-ons end up costing us more because they make upgrades more difficult and time-consuming. We don't use an automated upgrade process. It's manual. We have to do a full reinstall and then apply all our bolt-ons and make sure it still works. If we could automate that process it would certainly reduce our costs.

In terms of updating to version 9.2, which is the latest version, we're going to look into it next year and see what level of effort is required and determine how it impacts our current system. They release a new update about every six months, and there is a major release every year or two, so it's quite a fast schedule for updates.

Overall, I would rate our satisfaction with our decision to purchase Hitachi products as a seven out of 10. I would definitely recommend the data integration tool but I wouldn't recommend the reporting tool.

Which deployment model are you using for this solution?

On-premises

Ridwan Saeful Rohman

Data Engineering Associate Manager at Zalora Group

Jul 4, 2024

Download

Good abstraction and useful drag-and-drop functionality but can't handle very large data amounts

Pros and Cons

"The abstraction is quite good."

"If you develop it on MacBook, it'll be quite a hassle."

What is our primary use case?

I still use this tool on a daily basis. Comparing it to my experience with other ETL tools, the system I created using this tool was quite straightforward. It involves extracting data from MySQL, exporting it to CSV, storing it on S3, and then loading it into Redshift.

The PDI Kettle Job and Kettle Transformation are bundled by a shell script, then scheduled and orchestrated by Jenkins.

We continue to use this tool primarily because many of our legacy systems still rely on it. However, our new solution is mostly based on Airflow, and we are currently in the transition phase. Airflow is a data orchestration tool that predominantly uses Python for ETL processes, scheduling, and issue monitoring—all within a unified system.

How has it helped my organization?

In my current company, this solution has a limited impact as we predominantly employ it for handling older and simpler ETL tasks.

While it serves well in setting up ETL tools on our dashboard, its functionalities can now be found in several other tools available in the market. Consequently, we are planning a complete transition to Airflow, a more versatile and scalable platform. This shift is scheduled to be implemented over the next six months, aiming to enhance our ETL capabilities and align with modern data management practices.

What is most valuable?

This solution offers drag-and-drop tools with a minimal script. Even if you do not come from an IT background or have no software engineering experience, it is possible to use. It is quite intuitive, allowing you to drag and drop many functions.

The abstraction is quite good.

If you're familiar with the product itself, it has transformational abstractions and job abstractions. We can create smaller transformations in the Kettle transformation and larger ones in the Kettle job. Whether you're familiar with Python or have no scripting background at all, the product is useful.

For larger data, we use Spark.

The solution enables us to create pipelines with minimal manual or custom coding efforts. Even without advanced scripting experience, it is possible to create ETL tools. I recently trained a graduate from a management major who had no experience with SQL. Within three months, he became quite fluent, despite having no prior experience using ETL tools.

The importance of handling pipeline creation with minimal coding depends on the team. If we switch to Airflow, more time is needed to teach fluency in the ETL tool. With these product abstractions, I can compress the training time to three months. With Airflow, it would take more than six months to reach the same proficiency.

We use the solution's ability to develop and deploy data pipeline templates and reuse them.

The old system, created by someone prior to me in my organization, is still in use. It was developed a long time ago and is also used for some ad hoc reporting.

The ability to develop and deploy data pipeline templates once and reuse them is crucial to us. There are requests to create pipelines, which I then deploy on our server. The system needs to be robust enough to handle scheduling without failure.

We appreciate the automation. It's hard to imagine how data teams would work if everything were done on an ad hoc basis. Automation is essential. In my organization, 95% of our data distributions are automated, and only 5% are ad hoc. With this solution, we query data manually, process it on spreadsheets, and then distribute it within the organization. Robust automation is key.

We can easily deploy the solution on the cloud, specifically on AWS. I haven't tried it on another server. We deploy it on our AWS EC2, but we develop it on local computers, including both Windows and MacBooks.

I have personally used it on both. Developing on Windows is easier to navigate. On MacBooks, the display becomes problematic when enabling dark mode.

The solution has reduced our ETL development time compared to scripting. However, this largely depends on your experience.

What needs improvement?

Five years ago, when I had less experience with scripting, I would have definitely used this product over Airflow, as the abstraction is quite intuitive and easier for me to work with. Back then, I would have chosen this product over other tools that use pure scripting, as it would have significantly reduced the time required to develop ETL tools. However, this is no longer the case, as I now have more familiarity with scripting.

When I first joined my organization, I was still using Windows. Developing the ETL system on Windows is quite straightforward. However, when I switched to a MacBook, it became quite a hassle. To open the application, we had to first open the terminal, navigate to the solution's directory, and then run the executable file. Additionally, the display becomes quite problematic when dark mode is enabled on a MacBook.

Therefore, developing on a MacBook is quite a hassle, whereas developing on Windows is not much different from using other ETL tools on the market, like SQL Server Integration Services, Informatica, etc.

For how long have I used the solution?

I have been consistently using this tool since I joined my current company, which was approximately one year ago.

What do I think about the stability of the solution?

The performance is good. I have not tested the product at its bleeding edge. We only perform simple jobs. In terms of data, we extract it from MySQL and export it to CSV. There are only millions of data points, not billions. So far, it has met our expectations and is quite good for a smaller number of data points.

What do I think about the scalability of the solution?

I'm not sure that the product could keep up with significant data growth. It can be useful for millions of data points, but I haven't explored its capability with billions of data points. I think there are better solutions available on the market. This applies to other drag-and-drop ETL tools as well, like SQL Server Integration Services, Informatica, etc.

How are customer service and support?

We don't really use technical support. The current version that we are using is no longer supported by their representatives. We didn't update it yet to the newer version.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We're moving to Airflow. The switch was mostly due to debugging problems. If you're familiar with SQL for integration services, the ETL tools from Microsoft have quite intuitive debugging functions. You can easily identify which transformation has failed or where an error has occurred. However, in our current solution, my colleagues have reported that it is difficult to pinpoint the source of errors directly.

Airflow is highly customizable and not as rigid as our current product. We can deploy simple ETL tools as well as machine learning systems on Airflow. Airflow primarily uses Python, which our team is quite familiar with. Currently, only two out of 27 people on our team handle this solution, so not enough people know how to use it.

How was the initial setup?

There are no separations between the deployment and other teams. Each of our teams acts as individual contributors. We handle the entire implementation process, from face-to-face business meetings, setting timelines, developing the tools, and defining the requirements, to production deployment.

The initial setup is straightforward. Currently, the use of version control in our organization is quite loose. We are not using any version control software. The way we deploy it is as simple as putting the Kettle transformation file onto our EC2 server and overwriting the old file, that's it.

What's my experience with pricing, setup cost, and licensing?

I'm not really sure about the pricing of the product. I'm not involved in procurement or commissioning.

What other advice do I have?

We put it on our AWS EC2 server; however, during development, it was on our local server. We deploy it onto our EC2 server. We bundle it in our shell scripts, and the shell scripts are run by Jenkins.

I'd rate the solution seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Eric Smets

System Engineer at a tech services company with 11-50 employees

Oct 12, 2022

Download

Enterprise Edition pricing and reduced Community Edition functionality are making us look elsewhere

Pros and Cons

"We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic."

"The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is."

What is our primary use case?

We use it for two major purposes. Most of the time it is for ETL of data. And based on the loaded and converted data, we are generating reports out of it. A small part of that, the pivot tables and the like, are also on the web interface, which is the more interactive part. But about 80 percent of our developers' work is on the background processes for running and transforming and changing data.

How has it helped my organization?

Before, a lot of manual work had to be done, work that isn't done anymore. We have also given additional reports to the end-users and, based upon them, they have to take some action. Based on the feedback of the users, some of the data cleaning tasks that were done manually have been automated. It has also given us a fast response to new data that is introduced into the organization.

Using the solution we were able to reduce our ETL deployment time by between 10 and 20 percent. And when it comes to personnel costs, we have gained 10 percent.

What is most valuable?

The graphical user interface is quite okay. That's the most important feature. In addition, the different types of stores and data formats that can be accessed and transferred are an important component.

We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic. It's more about the business logic and less about the programming logic and that's really important.

Another important feature is that you can deploy it in any environment, whether it's on-premises or cloud, because you can reuse your steps. When it comes to adding to your data processing capacity dynamically that's key because when you have new workflows you have to test them. When you have to do it on a different environment, like your production environment, it's really important.

What needs improvement?

I would like to see better support from one version to the next, and all the more so if there are third-party elements that you are using. That's one of the differences between the Community Edition and the Enterprise Edition.

In addition to better integration with third-party tools, what we have seen is that some of the tools just break from one version to the next and aren't supported anymore in the Community Edition. What is behind that is not really clear to us, but the result is that we can't migrate, or we have to migrate to other parts. That's the most inconvenient part of the tool.

We need to test to see if all our third-party plugins are still available in a new version. That's one of the reasons we decided we would move from the tool to the completely open-source version for the ETL part. That's one of the results of the migration hassle we have had every time.

The support for the Enterprise Edition is okay, but what they have done in the last three or four years is move more and more things to that edition. The result is that they are breaking the Community Edition. That's what our impression is.

The Enterprise Edition is okay, and there is a clear path for it. You will not use a lot of external plugins with it because, with every new version, a lot of the most popular plugins are transferred to the Enterprise Edition. But the Community Edition is almost not supported anymore. You shouldn't start in the Community Edition because, really early on, you will have to move to the Enterprise Edition. Before, you could live with and use the Community Edition for a longer time.

For how long have I used the solution?

I have been working with Hitachi Lumada Data Integration for seven or eight years.

What do I think about the stability of the solution?

The stability is okay. In the transfer from before it was Hitachi to Hitachi, it was two years of hell, but now it's better.

What do I think about the scalability of the solution?

At the scale we are using it, the solution is sufficient. The scalability is good, but we don't have that big of a data set. We have a couple of billion data records involved in the integration.

We have it in one location across different departments with an outside disaster recovery location. It's on a cluster of VMs and running on Linux. The backend data store is PostgreSQL.

Maybe our design wasn't quite optimal for reloading the billions of records every night, but that's probably not due to the product but to the migration. The migration should have been done in a bit of a different way.

How are customer service and support?

I had contact with their commercial side and with the technical side for the setup and demos, but not after we implemented it. That is due to the fact that the documentation and the external consultant gave us a lot of information about it.

Which solution did I use previously and why did I switch?

We came from the Microsoft environment to Hitachi, but that was 10 years back. We switched due to the licensing costs and because there wasn't really good support for the PostgreSQL database.

Now, I think the Microsoft environment isn't that bad, and there is also better support for open-source databases.

How was the initial setup?

I was involved in the initial migration from Microsoft to Hitachi. It was rather straightforward, not too complex. Granted, it was a new toolset, but that is the same with every new toolset. The learning curve wasn't too steep.

The maintenance effort is not significant. From time to time we have an error that just pops up without our having any idea where it comes from. And then, the next day, it's gone. We get that error something like three times a year. Nobody cares about it or is looking into the details of it.

The migrations from one version to the next that we did were all rather simple. During that process, users don't have it available for a day, but they can live with that. The migration was done over a weekend and by the following Monday, everything was up and running again.

What about the implementation team?

We had some external help from someone who knows the product and had already had some experience with implementing the tool.

What was our ROI?

In terms of ROI, over the years it was a good step to make the move to Hitachi. Now, I don't think it would be. Now, it would be a different story.

What's my experience with pricing, setup cost, and licensing?

We are using the Community Edition. We have been trying to use and sell the Enterprise version, but that hasn't been possible due to the budget required for it.

Which other solutions did I evaluate?

When we made the choice, it was between Microsoft, Hitachi, and Cognos. The deciding factor in going with Hitachi was its better support for open-source databases and data stores. Also, the functionality of the Community version was what was needed by most of our customers.

What other advice do I have?

Our experience with the query performance of Lumada on large data sets is that Lumada is not what determines performance. Most of the time, the performance comes from the database or the data store underneath Lumada. Depending on how big your data set is, you have to change or optimize your data store and then you can work with large data sets.

The fine-tuning of the database that is done outside of Lumada is okay because a tool can't provide every insight into every type of data store or dataset. If you are looking into optimization, you have to use your data store optimization tools. Hitachi isn't designed for that, and we were not expecting to have that.

I'm not really that impressed with Hitachi's ability to quickly and effectively solve issues we have brought up, but it's not that bad either. It's halfway, not that good and not that bad.

Overall, our Hitachi solution was quite good, but over the last couple of years, we have been trying to move away from the product due to a number of things. One of them is the price. It's really expensive. And the other is that more and more of what used to be part of the Community Edition functionality is moving to the Enterprise Edition. The latter is okay and its functions are okay, but then we are back to the price. Some of our customers don't have the deeper pockets that Hitachi is aiming for.

Before, it was more likely that I would recommend Hitachi Ventara to a colleague. But now, if you are starting in an environment, you should move to other solutions. If you have the money for the Enterprise Edition, then I would say my likelihood of recommending it, on a scale of one to 10, would be a seven. Otherwise, it would be a one out of 10.

If you are going with Hitachi, go for the Enterprise version or stay away from Hitachi.

It's also really important to think in great detail about your loading process at the start. Make sure that is designed correctly. That's not directly related to the tool itself, but it's more about using the tool and how the loads are transferred.

Which deployment model are you using for this solution?

On-premises

Disclosure: My company does not have a business relationship with this vendor other than being a customer.