What is our primary use case?
We are using Apache HBase as a lookup database for queries that require doing lookups on customer data and eligibility checks for different kinds of customers. The customer data is stored in the Apache HBase database where we perform the lookup jobs.
What is most valuable?
The most valuable feature of Apache HBase is its abstraction; it's not directly an SQL database, but it adds an abstraction layer. There is a tool called Apache Phoenix, which we use as an abstraction for Apache HBase because it doesn't directly allow querying SQL statements. By using Apache Phoenix integrated with Apache HBase, we can run SQL queries and some other queries that we want to do. Even the lookup jobs I mentioned earlier, we use Apache Phoenix as an abstraction over Apache HBase.
HBase's integration with Hadoop and HDFS has definitely influenced our data storage strategy because Hadoop is the base or the foundation for those tools to run on. HDFS, being the storage for Hadoop, allows the query results from our lookup jobs to be placed there and transported through data pipelines to other data sources. Basically, Hadoop is a distributed system that everything operates on, and HDFS is the storage.
The impact of in-memory processing on our data operations is significant; it makes our processes fast. The in-memory processing lets us optimize our queries and helps us run concurrent queries and other jobs such as the lookup jobs we always use Apache HBase for.
What needs improvement?
Apache HBase could be improved by optimizing the integration with Apache Phoenix; sometimes the abstraction and lookup jobs lead to issues when there are too many requests. Resource optimization isn't always as successful as it should be, which can cause some query and lookup jobs to fail. For instance, during eligibility checks for credit, if there are many requests on the database, it might fail, and after such a failure, it doesn't allow us to run queries from the moment they stop. If there could be optimization to require less resource usage and allow those jobs and queries to pick up from where they stopped, that would be a great addition to the tool.
For how long have I used the solution?
I have been working with Apache HBase for one year and six months.
What was my experience with deployment of the solution?
The deployment process of Apache HBase is actually big and complex. I cannot describe it entirely in this review, but we set up Hadoop for a distributed system, along with HDFS, Hive for metastore needs, and other tools such as Apache NiFi for data pipelines, Apache Airflow for scheduling jobs, and Apache Superset for visualization. Integrating those tools to set up the whole data lake for the Big Data platform took months to configure everything to function properly. The first-time setup is complex because it involves many different tools.
What do I think about the stability of the solution?
The stability and reliability of Apache HBase can be quite good as long as you maintain a solid cluster environment and a good resource optimization process. However, issues might arise regarding compute resources such as CPU and memory, which is something to consider. Ultimately, the environment you select is a key factor in determining whether you experience good stability and reliability or not.
What do I think about the scalability of the solution?
Apache HBase is good in terms of scalability; its scalability largely depends on the type of deployment. Ours is configured using the official Helm charts for Apache HBase on a Kubernetes cluster, which makes it quite scalable.
How are customer service and support?
I don't often communicate with the technical support and customer service for Apache HBase. Most of the time, we rely on the official documentation and community forums for information about new features or to resolve the problems we encounter.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
We did not use a different solution for these use cases before Apache HBase.
How was the initial setup?
I participate in the initial setup and deployment of Apache HBase.
Which other solutions did I evaluate?
We did not evaluate other options when choosing Apache HBase; we decided to go with Apache HBase along with the abstraction of Apache Phoenix at the time of implementation, and it's been working for us.
What other advice do I have?
I'm working for a corporate that uses Apache HBase for their Big Data platform and I'm a Big Data engineer there.
We're using a version of Apache HBase that is compatible with the other Big Data tools that we are using on the platform, but it's not the latest one.
For Apache HBase, mostly we use it as a lookup database for queries that require doing lookups on the customer data or eligibility checks that we have to do for different kinds of customers. We store customer data on the Apache HBase database, and we do lookup jobs from those databases.
I utilize the automatic sharding of Apache HBase. Sharding is a way of partitioning the data sets into readable segments to run the queries in the most optimized way. We use those sharding capabilities to optimize our queries and run them as fast as possible to utilize fewer resources because a Big Data platform uses many resources. To remove those necessities, we use sharding to partition and optimize our queries, which allows us to run our queries quickly without consuming as much CPU and memory resources.
Apache HBase processing works by using in-memory data resources and takes advantage of the in-memory utilities without relying on storage capabilities.
The documentation I used is generally good, but the visualization could improve; it seems outdated. However, since it's an open-source tool, one cannot expect everything to be perfect, and the maintainers are typically driven by passion rather than finances. Overall, it's good documentation, and I've referenced it to address various problems and implementations.
Based on my experience, I would rate Apache HBase an eight out of ten.
I wonder if there are any other options that you would recommend?
Which deployment model are you using for this solution?
On-premises