What is our primary use case?
My usual use cases for Google Dataplex involve a team specifically managing their data, BCSS, which is a software services group and cloud software services group. They have a lot of data managed in the Azure and Google Cloud environments, around three to four petabytes of data. To bring the data into the core ecosystem is taking time, which is why BCSS decided to use the cataloging solutions of Google Cloud, out of the box solutions.
That is how we are creating the cataloging information, which we are consuming in one place and then providing as a service.
What is most valuable?
The most valuable features or capabilities of Google Dataplex for me are cataloging and lineage.
The reason I value lineage is because we have a lot of Google Cloud footprint; that is why we are using the lineage from Google Dataplex.
We do use Google Dataplex's automated data discovery feature.
The data discovery feature has helped us streamline data cataloging because, as I mentioned, we have the Google Cloud environment and then other systems. To consume and read through the petabytes of data is very difficult, which is why we went ahead with Google Dataplex, which collects the data from data cataloging information from the Google Cloud environment.
What needs improvement?
There are no specific areas of Google Dataplex I think could be improved because that is not our priority.
To improve or enhance Google Dataplex, I would point out the connections; there is a lack of connections, and the variety of databases are not in a position to connect.
I assess the usefulness of Google Dataplex's centralized management layer in simplifying data operations across environments as not there yet because the connections are lacking. If I look at the environments of something such as Ericsson, there are so many environments; to connect to each and every system, they are not there yet.
For how long have I used the solution?
I have been working with Google Dataplex for two years.
How are customer service and support?
My team communicates with the technical support of Google Dataplex; as a leader, I do not get into the day-to-day activities. However, as a team, we do talk to the technical team.
Based on my experience with the technical support team, I would rate them around a seven to eight; I would say it is better than the average quality we receive from other vendors.
Regarding my interaction with the technical support, when the team is building the data catalog, their team actually visited our office and spent a good amount of time ensuring that we build that cataloging in a good, proper way, following the industry standards as well.
How would you rate customer service and support?
Which other solutions did I evaluate?
We work with four more vendors: BigID, Snowflake, and a Google cataloging system. We are in the process of getting Alation as well.
What other advice do I have?
Recently, I have been working with Informatica, specifically with IDMS, data quality, and integration systems also of Informatica.
Currently, I am not working with it, but I am in the process of getting it.
Google Dataplex is one of our counterpart teams using it, but as a data leader, I am managing that. It is good, but Informatica is better because from the industry point of view, they are much better.
Since I am managing this solution, I am using it internally in my company.
For Google Cloud data, Google Dataplex's AI-driven data classification has really helped; the out-of-the-box capabilities are good, and they have good classifiers. But when I compare with the industry, they are evolving.
The evolving part for me is the classifiers; if I look at Informatica, they have around 247 classifiers, but Google Cloud has limited.
Regarding Google Dataplex's data lifecycle management, I have not used that capability, but I know that it is there. Lifecycle management is something I am building separately; actually, my team is building it separately.
There is nothing that I would improve or enhance because that is secondary for me; for me, the main thing is Informatica.
Google Dataplex is something which is a small use case for me, but my main focus is Informatica. Since I have less than 10% of the data based in Google Cloud, we are facing some challenges, which is why we used Google Dataplex, but that is not going to the core.
Regarding any positive impact of Google Dataplex on my work, nothing major has come up as of today, other than the cataloging and lineage; at least with the limited usage we are doing, it is somewhat supporting us in the lineage part.
My overall review rating for Google Dataplex is 5 out of 10.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google