What is our primary use case?
Normally, I use H2O.ai for my machine learning tasks, and to give an example, some of the models that I've created using H2O.ai are taxi demand forecasting and a scoring model for leads. Most of my use cases are around running machine learning algorithms on the data and then producing some sort of predictions.
I have utilized the AutoML feature in H2O.ai, which is one of the very powerful features where you don't need to worry about which algorithm is best for your model. AutoML chooses the right model for your data and does the rest itself. AutoML has been a very powerful feature in many high-end tools, such as DataRobot and DataBricks, but H2O.ai has had the AutoML feature for a long time. Although I come from a data science background and prefer to evaluate each of my models myself, I've used AutoML in H2O.ai, and I think it's a great feature to have.
AutoML reduces the time and expertise needed for developing machine learning models by approximately 50 or 60%. In traditional processes, I would choose five or six models, train each model, and then assess them based on one or multiple metrics, which can be very time-consuming. With AutoML automating that process and utilizing a large repository of models fitting on the data, it immensely reduces the time spent. Prior to AutoML, data scientists usually had four or five models in mind, but now AutoML uses more models, making it clear that a data scientist versus AutoML cannot be compared, as AutoML definitely reduces the amount of time spent on model creation.
I once worked on a model for anomaly detection, using H2O.ai to predict anomalies in terms of payments made, determining whether a payment was fraudulent. Another example includes the taxi demand prediction model, which predicts how many taxis are needed at different shopping malls in Dubai. For instance, it would predict that tomorrow at 10 a.m., 100 taxis will be needed at Dubai Mall. These are examples where H2O.ai has helped organizations make decisions based on the model output.
What is most valuable?
One of the things I really appreciate about H2O.ai is the flexibility of the tool in terms of working with different languages, as I mostly use Python for my data science work, and it's very easy to build machine learning models. H2O.ai also comes with Driverless AI, where you can automate many tasks, such as connecting to your data, performing operations on the data to produce final features for your model, training the model itself, and visualizing or saving the results, all within one platform. That's what I most appreciate about it, and you can automate that, so next time your data changes, your Driverless pipeline runs again to update your predictions or model output. The flexibility and ease of use that H2O.ai provides are very good.
What needs improvement?
One improvement I would like to see in H2O.ai is regarding the integration capabilities with different data sources, as I've seen platforms like DataIQ and DataBricks offer great integration with various data sources. H2O.ai could benefit from enhanced integration with real-time versus offline data sources, as well as improvements in productionalization solutions, including better deployment options on platforms like Azure and CI/CD integration.
One of the features I'd like to see included in upcoming releases of H2O.ai pertains to the growing trend of Generative AI, with applications for LLM-based models and vector databases. I would like to see a solution similar to Azure AI Foundry, which provides the flexibility to integrate different LLMs into applications, including H2O-GPT and other models for varied applications.
For how long have I used the solution?
I first worked with H2O.ai in 2018, and since then, the tool has evolved. Recently, for the last six months or so, I've been using it quite extensively for different tasks, so I have been working on and off with it for about seven years.
What do I think about the stability of the solution?
Performance issues are quite subjective, as working with large datasets can lead to performance challenges, but overall, H2O.ai has been pretty stable for me, running very well with small to medium datasets, although larger datasets can introduce performance issues as with any platform.
What do I think about the scalability of the solution?
H2O.ai is definitely scalable, as it supports running processes using Spark with Sparkling Water, enabling parallel execution, making it suitable for both simple and compute-intensive tasks.
How are customer service and support?
I haven't escalated any questions to technical support recently, as I haven't interacted with the technical team in the last six months, but I had contact with them before.
My experience with the tech team has been good, and I don't have any negative feedback to share.
How would you rate customer service and support?
How was the initial setup?
I find the initial setup of H2O.ai to be pretty straightforward, without any complexities. This contrasts with platforms like DataIQ, which I find more complicated to set up.
What about the implementation team?
In my current organization, we purchased H2O.ai through a vendor, as normally the solution setup involves a bidding process with third-party vendors providing and implementing the solution.
What was our ROI?
H2O.ai helps generate positive ROI by allowing users to spend less time building models while producing more output as data scientists. The two projects I mentioned before were very successful with significant ROI impacts on business KPIs.
What other advice do I have?
I would rate the technical support a nine.
For organizations considering H2O.ai, my recommendations include appreciating it as a great and flexible tool for machine learning tasks without incurring the high licensing costs associated with alternatives like DataIQ, DataRobot, and DataBricks. Users should fully utilize the AutoML functionality to streamline the model production process significantly.
On a scale of 1 to 10, I would rate H2O.ai as a product a nine, as it is a very good product and solution.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Microsoft Azure