What is our primary use case?
We've had a lot of turnover in our agency in the past several years, and we have many new staff members who don't know where our data is. We had not had a virtual metadata repository or any kind of metadata repository. That was one issue we needed to address because people were creating data silos. They didn't know we had data in our transactional databases.
Another big issue is that we need to implement Phase 2 of an agency-wide data quality program, and we like the automated data quality rules in Lumada's Data Catalog. We didn't find a regular, full-time data quality person, so we're using this solution for that.
And the third use case is that we're about three months into an application modernization project. It is going to be a sweeping, epic project for our agency in which we are moving to online applications, permitting and licensing on Salesforce. We believe that by tagging fields there before the consultant starts, and as the consultant works through it, and then after, if data is migrated elsewhere, it will be tremendously helpful for the success of the project.
How has it helped my organization?
We're using Data Catalog for our application modernization project. Our adjudication division tagged the fields or columns associated with specific types of investigations, whether it's domestic or stock pond. We're also tagging fields involved in the drill log process. We are tagging fields so that the consultants that we've hired to do the application modernization can see exactly which fields are involved in specific processes, hopefully saving time and resulting in a better end product, sooner.
We have also tagged every single customer name or name field in our database ecosystem, because we're going to be doing master data management.
It's going to help with change management and with the shared data and with making metadata available to people. I can type in the term "well," and I get 302 returns from our transactional databases, our data warehouse, and our document management system—because we have DocuShare and we scanned those tables—and the attribute tables from our GIS spatial data. I'm getting "well" wherever it occurs in that whole spectrum, in less than two seconds. How else would I have done that before? There was no way.
We've just started introducing the reports to our agency, in places here and there, but we need to have a "brown bag" gathering while I introduce it to everybody, so that people start using it. But even people who are experts with the data, the few that are left, are amazed because they found data sources that they didn't know existed.
And when talking about time spent on data stewardship tasks, I have used Lumada Data Catalog myself, and it has saved me time. Whereas before, I needed to write a query or understand data somewhere else, I can just type in the thing I'm looking for and find where it is. Part of data stewardship is data discovery and we're doing that for our application modernization project. People have already been using it to that extent. I have been asked to attend all the data discovery meetings now, even though I never worked in that section, and I always have Lumada Data Catalog ready. I've put my display up on the screen for everybody to see, to walk us through things.
In terms of how it has affected our data quality, we're early in that process. We have created automated rules in Phase 1 of our data quality program, but here's what's going to happen. The consultant is going to profile and cleanse certain data elements as we move more of our licenses, permit process, and applications online. This discovery process is going to reveal more critical data elements. As that happens, we are tagging those critical data elements and we, in the Enterprise Data Management Office will create, if we haven't already, data quality rules for those critical data elements. We will then track that on the data quality dashboard. It will be a requirement for business units within our agency to watch that, because, once we've gone to the trouble of cleansing it, we're not going to let it get sloppy again. There is no way I could do all of that by myself just using Power BI. And with this tool I can share the report with people on SharePoint. We'll know what's critical. We'll have the rules set up, and it will be automatically running.
What is most valuable?
The ability to easily and quickly ingest new data sources is the most valuable feature. I'm the lead on that, but I am not on the IT side. I'm on the enterprise data management side. I come from a background in water resources and I'm not an especially technical IT person, but my data governance lead and I are able to
- ingest the data
- quickly profile it
- do data identification and tagging.
And I have been able to create my own data quality rules in it as well. I don't like having to rely on other people to do stuff for me. I don't like having to wait. I want to be the one to do it myself.
It's also very easy to use. That's one of the things I tell other agencies and people. When it comes to visibility into your data assets it's great because it has the Galaxy View. I can see data lineage, somewhat, that way and I can see where there are data-sharing issues.
I'm able to link all the different reports, like the data discovery and data identification reports. I have linked to those on our Office of Enterprise Data Management SharePoint site. Even people who do not have licenses to the desktop version can still have access to the metadata in an easy way. And that was really important to our agency.
Also, the automated discovery offered by Data Catalog is really great. The Hitachi Vantara staff laugh at us a little bit because we don't have that much data compared to other people, but we have 13 million records and 38,000 columns and that's obviously too much for me to manage without a tool. It goes through the whole process so fast that it's incredible to me. It doesn't take long at all to go through our data sources.
What needs improvement?
As I've said, we've tagged a lot of fields that are related to specific processes, like the driller's log or, for example, if you want to get a license to be a well driller. Now, what I'm having to do for the consultants is create an Excel spreadsheet that has the name of the tag and a description of it. I'm now creating a data silo.
What would be helpful is a place, inside Lumada Data Catalog, where you can describe the tags that you're using. Otherwise, anybody coming into the system, or seeing the tag from the outside in one of the reports, is going to say, "What is that tag really referring to?" and has to know where my spreadsheet is.
For how long have I used the solution?
We have been using Hitachi Lumada DataOps - Data Catalog for three years now. We started with a proof of concept and then we had a break. We then bought the product for real because we were encouraged by the proof of concept.
What do I think about the stability of the solution?
The stability of the solution seems fine. We bought it when it was Io Tahoe, so we were concerned when we found out that it had been bought by another company. But the stability has been fine. It doesn't crash.
What do I think about the scalability of the solution?
It's very scalable. Big corporations with a ton of data compared to us could use it. There are other state agencies that have way more records than we do, like education or corrections or health. They could easily use this. But if you are a smaller agency like us, and you don't have a huge budget and you don't have a lot of people, you can also use this tool. Any organization of any size and any type of business could use this tool.
How are customer service and support?
They have been very helpful to us from day one. I'm not an incredibly technical person, and they have been very patient as we've set up things. For example, when we first started, we had to set it up behind the Department of Administration's firewall, and that department was really picky about it. It was really a drag to get credentials for the Hitachi people so they could help us. But they stuck with it and they never complained. They've just been nothing but patient and helpful the whole time.
Every time we have a problem, we just email them and somebody looks into it. I have no complaints about their support.
How would you rate customer service and support?
What's my experience with pricing, setup cost, and licensing?
We can afford it. We got a three-year contract. I'm hoping that when our contract expires that it is still going to be reasonable enough for us to afford. One of the ways I pitched it is that I told my boss, "I can't find somebody to be a full-time data quality person at the rate the state pays. Can we get this tool instead of a data quality person?" If it were to go up in price a lot, I don't know if I would be able to keep it.
Which other solutions did I evaluate?
Our agency only started its enterprise data management program five years ago. At that time, we looked into Collibra Catalog and Informatica, and they did demos for us, because we needed a business glossary. We didn't even know there was such a thing as a data catalog. Those solutions were great, but they were very expensive and way beyond our budget.
We also heard from several sources over the years, people who don't know each other, that those solutions take a significant amount of staff, energy, and time to get up and running. I had to show return on our investment as soon as possible, and we don't have the people to staff that kind of work.
We did a proof of concept project with Lumada to see what it could do with our data. I didn't want to see it with other people's data, I wanted to see it with mine. Once we did the proof of concept we felt it would be useful for us and be more reasonable in terms of price for us.
What other advice do I have?
We haven't done very much classifying of assets because literally 99.9 percent of our data is public information. But the staff did help us set up some custom identifications to look for specific permit numbers. That is helpful because we want to know where they're showing up when they're not supposed to be there. They're supposed to be in certain fields and not others. But overall, we spend a lot of time tagging data and working on data quality rules. But to the extent that we've used it, the classifying functionality seems to work fine.
We haven't gotten to the organized data governance part yet, but I think it's going to be instrumental. When we first started our enterprise data management program, we centered all of our activities around our first data sets that went into our data warehouse. People were not as excited about that. To me, having a source of truth is exciting and invaluable. The thing that the newer staff and younger people got excited about was the business intelligence tool that we used to display the data that's in the data warehouse. People are not going to be excited about data governance for the sake of data governance. But when you have a tool like Lumada Data Catalog, it gives you a place to start.
If one section of our agency wants to change a column, we have a better chance, right away, of them understanding how it impacts other sections as well. If we can get our application developers and coders to use it and bring up the Galaxy View and the data lineage view, they will be able to show somebody, right off the bat, what the impact of their changes will be and explain it to them. Usually, they just say, "Well, that column's in a table that is really important..." but it's so abstract and vague.
Business people don't have the time or inclination to understand relational databases. But if they can see the visual Galaxy View, it's going to go a long way toward helping our data governance, because in fact, it kind of stalled. Our whole enterprise data management program actually stalled because we had no metadata management or metadata repository. You're not going to be able to improve in the other categories if you're always going to be super weak in metadata management, because it affects data quality, data governance, platform and operations, et cetera.
As storage is not an issue for us, we haven't used Data Catalog formally to look for duplicate data yet. But, in getting ready for our application modernization, we tagged all the tables and fields that have customer name or names and we've tagged reference tables. We'll have a table for "watershed" or "groundwater basin," and it shows up eight times, because it will be in the surface water database, the groundwater database, the wells database. We've already tagged the duplicate data, but we haven't used the automatic duplicate function yet.