- Cluster rolling restarts
- Cluster wide configuration management
System Engineer at a tech company with 10,001+ employees
For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters. But, it has HBase 1.0 stability issues and processing speed needs improvement.
Pros and Cons
- "For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters."
- "Cloudera 5 is currently very unstable. Between two Cloudera 5 clusters, we have an incident at least twice a week due to what are now outstanding bugs."
What is most valuable?
How has it helped my organization?
For the clusters using CM, we are able to more tightly control and manage the configuration of all nodes in the clusters.
We are currently running six production clusters totaling 900+ nodes, and are building three more clusters. Knowing that if someone has some custom configuration on a node that they haven’t communicated out, and that I can ignore that configuration and bring that node into line with where we’ve decided to run the cluster, is very beneficial.
What needs improvement?
HBase 1.0 stability issues and processing speed is a major area for improvement. Right now, our Cloudera 5 clusters run four to seven times slower than our Cloudera 4 clusters using our storm and kafka topologies, which causes real-time processing to be a major challenge.
CM’s API is very limited and difficult when used on multiple clusters in the same CM instance
For how long have I used the solution?
We've used it for approximately two years. We also use Cloudera Manager, which is 6/10.
Buyer's Guide
Cloudera Distribution for Hadoop
March 2026
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
885,264 professionals have used our research since 2012.
What was my experience with deployment of the solution?
No issues encountered.
What do I think about the stability of the solution?
Cloudera 5 is currently very unstable. Between two Cloudera 5 clusters, we have an incident at least twice a week due to what are now outstanding bugs.
What do I think about the scalability of the solution?
It's very easy to deploy and scale as large as you want. Once created on the CM management cluster, is difficult to scale up as needed, as you add more clusters to the same CM instance.
Which solution did I use previously and why did I switch?
No previous solution was used.
How was the initial setup?
We were already running one production cluster with approximately 75 nodes when I joined, so I’m not familiar with what was needed to get the initial production cluster up. Once I joined, I assisted in standing up the additional nodes and clusters using our chef automation.
What about the implementation team?
In house via chef automation. Chef, or similar systems, makes it much simpler to stand up large scale clusters. That said, I have not used or evaluated vendor team implementation methods.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Architect at a marketing services firm with 501-1,000 employees
Cloudera Manager Hadoop Cluster Installation Evaluation
I decided to give Cloudera's Manager software a try, and was pleasantly surprised at how simple it becomes to deploy a substantial Hadoop cluster.
I began by creating an automated kickstart installer for RHEL 6.2 (booting off a custom isolinux image created for this purpose), with all of the required packages, so that from server power on to creating a 20+ node cluster takes less than 15 minutes. The limitation for the number of concurrent node installs is based on network and disk i/o bottlenecks on the deployment server. If you wanted to PXE boot the cluster in a production environment, you would want a bank of servers behind a load balancer, optimally.
Once the Manager is installed on the master node, you simply log into the administration webpage, and from there, add all of the hosts to deploy the cluster on. One nice discovery was that it takes advantage of regular expressions for host names or IP addresses, so you can literally create a cluster containing hundreds of nodes with a trivial amount of effort.
Once the software is deployed, you can select the roles for each of the servers. It's an incredibly painless deployment. That being said, it is not without its flaws.
One of the primary flaws is that all of the configuration and log files are in non-standard locations, and are split in non-standard ways. It's obvious from the way that the files are arranged that it simplifies programmatic deployment. It also makes it a bit harder for a human who is used to standard Hadoop deployments to figure out where everything is located.
And finally, I discovered a bug with one of the packaged software products, Oozie. One of the resource files, oozie-bundle-0.1.xsd contains an invalid regular expression on line 22. I haven't tracked down the behavior, but for some reason JDK 1.6.30 will parse that invalid regex, but JDK 1.7U2 will exit with errors. Naturally, I was running JDK 1.7U2, so it took me a little extra time to debug the problem.
Overall, I quite liked Cloudera's Manager. It's certainly one of the better cluster deployment products I've seen.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Cloudera Distribution for Hadoop
March 2026
Learn what your peers think about Cloudera Distribution for Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: March 2026.
885,264 professionals have used our research since 2012.
Enterprise Data Architect at a pharma/biotech company with 11-50 employees
Used for big data analytics, data sharing, and reporting
Pros and Cons
- "Cloudera, as a whole, is designed to provide organizations with solutions for big data."
- "The performance of some analytics engines provided by Cloudera is not that good."
What is our primary use case?
We mostly use the solution for big data analytics, data sharing, and reporting.
What is most valuable?
Cloudera, as a whole, is designed to provide organizations with solutions for big data. Cloudera is not one single component. It has many components related to storage, analytics, queries, and processing. All of these components work together to support big data implementation and analytics.
What needs improvement?
The performance of some analytics engines provided by Cloudera is not that good. So, we are using other analytics tools besides Cloudera.
For how long have I used the solution?
I have been using the solution for more than four years.
Which solution did I use previously and why did I switch?
We also use other tools like DataIQ and Apache Kudu.
What other advice do I have?
I'm working with the solution myself. As a company, we are implementing it for other customers. Cloudera itself does not provide analytics. It prepares data for analytics tools that work with Big Data, such as Apache Spark, DataIQ, and Tableau.
Overall, I rate the solution a nine out of ten.
Disclosure: My company does not have a business relationship with this vendor other than being a customer.
Buyer's Guide
Download our free Cloudera Distribution for Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Updated: March 2026
Popular Comparisons
MongoDB Enterprise Advanced
Microsoft Azure Cosmos DB
Apache Spark
IBM Netezza Performance Server
Couchbase Enterprise
Neo4j Graph Database
IBM Spectrum Computing
HPE Data Fabric
Apache HBase
DataStax Enterprise
Oracle NoSQL
Buyer's Guide
Download our free Cloudera Distribution for Hadoop Report and get advice and tips from experienced pros
sharing their opinions.
Quick Links
Learn More: Questions:













Hi
Can I have Cloudera's Manager software for free to test and deploy it on a sandBox to work on a POC purposes.