What is our primary use case?
We are using Informatica PowerCenter for ETL, and simultaneously we are using Informatica Data Quality for data profiling, validation, to remove duplicate entries, and for data cleansing.
Ours is a restaurant client, so we get the sales data. There are more than 5,600 franchise sites in our project. We get data from multiple sites once EOD is cleared for all the sites. Some of this data has duplicate records, some has wrong entries or null values. If we directly load the data into our target tables, it may cause errors in querying and reporting tools, leading to data disturbance. We use Informatica Data Quality to maintain good data profiling and data cleansing steps.
We had one situation where we did not encounter any data miss, but during reporting, it was showing as if the sales exceeded our daily limit. We thought that the restaurant got profits, but after due diligence, we realized that there were some duplicate entries. We then understood that we needed to have a data quality check, which is why we integrated Informatica Data Quality along with Informatica PowerCenter.
What is most valuable?
Some of the best features Informatica Data Quality offers include AI automation using CLAIRE, which integrates AI with Informatica Data Quality, and its user-friendly drag-and-drop interface. All of this is simply usable to any person who has minimal knowledge of ETL.
Rather than querying every table to check for any duplicate entries or null values, it is impossible to query for each site. Once we integrate it with Informatica Data Quality and use the drag-and-drop function to specify the conditions we need and connect to the databases, it directly checks if the values are within the threshold or if we can set conditions, such as not entering records with null values. It also features a match and merge condition, from which data profiling and data cleansing can be done.
What needs improvement?
One thing is that, compared to the features provided by Informatica Data Quality, when compared to other tools offering similar features, it is somewhat costly. The scalability is not up to mark compared to current market trends.
When we want to work on a large dataset, and we want to scale the data down or up, it requires a lot of work. We also need to reboot the server every time there is a data scalability issue. Additionally, the pricing factor is quite high for the kind of features it offers.
For how long have I used the solution?
I have been using Informatica Data Quality since my internship days for almost one year.
What do I think about the stability of the solution?
Informatica Data Quality is stable, but sometimes there are server crashes that require a reboot of the entire server.
What do I think about the scalability of the solution?
The scalability is not up to mark in my view because even a small increase in data, like the number of rows, can cause the server to crash, requiring a reboot.
How are customer service and support?
The customer support for Informatica Data Quality is not up to mark, but the documentation is somewhat relatable from which we can learn about the features and problems.
Which solution did I use previously and why did I switch?
Before we were not using Informatica Data Quality, some days there was data disturbance, and we thought it was caused by the source itself. Later we discovered that the disturbance occurred during the transformation phase in Informatica PowerCenter or SnapLogic tool. When we integrated Informatica Data Quality with Informatica PowerCenter, there was minimal chance of missing data, null value entries, or duplicate entries. The reporting and data quality have improved significantly since then.
How was the initial setup?
There were very few errors, only minimal errors that can also be managed by changing the metrics in Informatica Data Quality or Informatica PowerCenter. The reporting has also been faster because there were no null values or duplicate records from which the workflows or sessions may have failed. Since we integrated Informatica Data Quality, we have seen a lesser number of situations where users report that data is missing or incomplete.
What was our ROI?
Since we integrated Informatica Data Quality in our project, the amount of human interaction has reduced, so the team has decreased. There were 12 people working on BI solely focused on data monitoring and error handling. After integrating Informatica Data Quality, it has been reduced to five people now, resulting in cost savings for our project. Additionally, time has improved; previously, we manually checked for data missing or ran a pipeline every three to four hours manually to check for missing or null values. The process has been automated with Informatica Data Quality.
What's my experience with pricing, setup cost, and licensing?
I have been informed by our management team that the pricing is high, but I am not sure about the specific figures regarding what the pricing is.
Which other solutions did I evaluate?
We have evaluated the quotations from Collibra and Monte Carlo, which are similar tools to Informatica Data Quality, but since we were already using Informatica PowerCenter, we were keen to adopt Informatica Data Quality.
What other advice do I have?
Informatica Data Quality is not a tool that any person with ETL knowledge can use. Proper training on the tool is necessary, as it is somewhat complex. A minimum of one month of training is required before operating in the production environment. Additionally, reviewing the documentation is crucial before working with the tool. When we want to work on a large dataset, and we want to scale the data down or up, it requires a lot of work. We also need to reboot the server every time there is a data scalability issue. Additionally, the pricing factor is quite high for the kind of features it offers.
The customer support for Informatica Data Quality is not up to mark, but the documentation is somewhat relatable from which we can learn about the features and problems. My overall review rating for Informatica Data Quality is 7 out of 10.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other