What is our primary use case?
I work with Data Hub as a user, but I also have some administrative responsibilities there. I'm not a final user; the final users are business users, and I play some administrative roles in the tool to have the metadata information available for all Uber users.
I'm a Data Quality Engineer focused on data governance. I manage the metadata information for Uber, and I also use this to apply some data quality rules. My focus in my current job is to apply some rules and manage the metadata information and ensure it is accurate for the end users, which is why I'm using it.
What is most valuable?
One of the biggest advantages of Data Hub is the very good integration, for example, a department focused on development made the integrations between Data Hub and BigQuery. When this integration is very well done, it is possible to check data lineage, which I think is a very important subject in data governance. It's something that cannot be done manually, so having a tool that shows the data lineage from the source until the target tables helps us a lot. I think this is one of the best advantages that we have.
Data Hub helps to analyze data from various sources in my case.
What needs improvement?
I know that the integrations are not easy to do, and I believe it happens because it's a customized solution. There always needs to be software developers to work on this. It's complicated; every time we want to integrate new things or new sources, we need to generate a ticket or a request to another department. When I had my experience with Atlan, for example, I was able to connect different sources in a very user-friendly way. I just needed to set up some configurations and connect to the source without having to be a software developer or develop any code in the back end. It was just a feature in the data catalog that enabled me to connect with different kinds of sources. That's why I think the disadvantage of having a customized solution. Although I think Data Hub itself is a very good tool, years ago I had the opportunity to work with it, but with a clear interface and the open-source solution, which was very clear and easy to connect. At Uber, we need to have a request when we want to integrate new sources.
Regarding Data Hub's intuitiveness, regarding analytics, I would say that some quality dimensions are available for us. For example, for each field name or each column in a table, it's possible to see the frequency, how many values we have for a specific type or category, and we can see if there are new or null values, whether the columns are empty or not, along with some metrics. This is regarding the data quality dimensions, such as nullables and things of that nature. That is all we have for features. I remember when I was working with Atlan, there was a feature I liked very much—the possibility to have a sample. When I clicked on a table, I could see a short sample without needing SQL skills. I just clicked the table and could see some values or what the table represents; the data catalog would show a screen with some rows of the table. This feature was very good, but we don't have it in Data Hub the way it is implemented at Uber. I think it would be a very good feature for analytics, and we don't have it at the moment.
The integration part could be better, but again, it's because it's a customized solution. I think if they used the native version of the tool, it would be simpler. The integration part and the process of setting up new data quality rules would be important for data governance players like me.
For how long have I used the solution?
I've been using Data Hub for one year and a half.
What do I think about the stability of the solution?
Since I've been using Data Hub, it has always been very stable; I can say it was one hundred percent stable. I never encountered issues trying to check datasets or columns and checking their numbers. It has always worked very well in that regard.
What do I think about the scalability of the solution?
I think Data Hub can scale fast in its native way, but with a customized solution, it takes more time.
How are customer service and support?
My support is internal when I have any questions or requests, so I direct it to a support team from Uber and not from the provider. When I was working with Atlan, and needed support, they were very good at attending to my requests directly. I had contact with the provider, so it was very fast. At the moment, I don't have that; I direct my requests to an internal department of Uber.
Which solution did I use previously and why did I switch?
I'm not using Atlan anymore because the company that I was working with, I'm no longer there. I went to another consultancy group and now I'm working with other platforms. Atlan is not the one that I'm working with at the moment.
I am working with a different platform that is also regarding data governance and metadata management. The platform itself, the back end, is Data Hub. But the user interface is customized for this client. I'm currently working for Uber, the Uber company.
How was the initial setup?
Because Data Hub is a customized solution, I don't have many details about the installation and deployment process. However, when I was using Atlan, I saw that they implemented very fast. In this way, I believe both tools have an easy way to implement, but because Uber chose to have a customized solution, it became more difficult and complex. However, in their native way, I think both tools are good.
What was our ROI?
In terms of ROI, I would say that Atlan is better. I had a very good experience using Atlan, and I believe it's faster. Velocity in organizations today is very important; people want to see things very fast. I believe Atlan has a better approach compared to Data Hub.
The way Data Hub is implemented at the moment, Atlan is much better. It's much, much faster.
Which other solutions did I evaluate?
I worked with Databricks, but I'm not sure if it is from Amazon; I don't think so. I think Databricks is from Microsoft.
What other advice do I have?
I have experience with Data Hub to some extent.
I believe Data Hub uses a lot of APIs, but I don't think I'm the right person to answer that because it relies a lot on a technical aspect that I don't understand. I cannot provide you with a curated answer about it, but I know that the software development team that works with this customized solution uses APIs; I just don't know how to speak about their performance, whether it's good or not.
Real-time batch processing is very important for me and my organization because some datasets are very critical for the business. If we have batch processing, it's good for the organization to set up a very large dataset, for example, and have it available on the data catalog in a short time. I agree that this is important.
In both experiences I had, the integration with the catalog was with GCP. I don't have experience working with another data warehouse, so even in Atlan or now in Data Hub, it is connected with GCP.
I don't use anything else like CRM, storage, or any architecture management tools; just Data Hub.
I would give Data Hub a score of seven out of ten, summarizing everything that I've discussed about the product.
Which deployment model are you using for this solution?
Private Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Google