What is our primary use case?
Cloudera DataFlow is used as an ETL or ELT solution within Cloudera's data pipeline. Our organization heavily relies on it for data ingestion, transformation, and warehousing. It is also used daily for operational tasks, and it integrates well within Cloudera's ecosystem for high performance and throughput.
What is most valuable?
The most valuable features of Cloudera DataFlow include its native connectivity with a wide range of ecosystems within Cloudera such as Hive, Impala, and Spark. These connections ensure high performance and high throughput. Additionally, it allows for end-to-end scheduling of workflows without needing third-party applications.
What needs improvement?
Cloudera DataFlow's UI interface could be enhanced significantly. Memory handling can also be improved to be better than it is today.
For how long have I used the solution?
I have been using Cloudera DataFlow for about five or six years.
What do I think about the stability of the solution?
Overall, I would rate the stability of Cloudera DataFlow as eight out of ten.
What do I think about the scalability of the solution?
I would rate the scalability of Cloudera DataFlow as seven out of ten. It needs some enhancements to improve scalability.
How are customer service and support?
Customer service and support are responsive and knowledgeable. I would rate them eight out of ten.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
Previously, I used Informatica DataStage, IBM DataStage, Talend, and Ab Initio. We switched to Cloudera DataFlow because it is fully integrated with Cloudera and already available with Cloudera's license, thus avoiding the need for additional third-party licenses.
What about the implementation team?
We have an operational team that handles maintenance of Cloudera DataFlow. This team consists of different engineers with similar skill sets in development and operation.
What's my experience with pricing, setup cost, and licensing?
Cloudera is an expensive platform, and the licensing cost is high. I would rate its cost-effectiveness as nine out of ten. There are no extra expenses once the Cloudera license is acquired.
Which other solutions did I evaluate?
Before switching to Cloudera DataFlow, I evaluated Informatica, IBM DataStage, Talend, and Ab Initio.
What other advice do I have?
Cloudera DataFlow is fully compatible with Cloudera's ecosystem and offers high efficiency through native connectors for various ecosystems. However, the learning curve is high, and there is a shortage of skilled professionals. My overall rating for Cloudera DataFlow is eight out of ten.
Which deployment model are you using for this solution?
On-premises