Amazon EMR offers scalability and cost-effectiveness, using auto-scaling and managed services for ease of use. It integrates seamlessly with Hadoop, HDFS, and multiple open-source platforms. Users appreciate its stability, reliability, and high processing efficiency with tools like Hive, Spark, and Flink. EMR provides robust data management with flexible cloud storage options. Its secure workflow management is valued, and the pricing is resource-based, supporting extensive data processing without hardware management overhead.
- "I rate Amazon EMR as ten out of ten."
- "Amazon EMR has multiple connectors that can connect to various data sources."
- "The security of the managed workflow and the managed services are the best features for us. Since we inherited their security model and it's all managed services, those are the key benefits for our clients."
Amazon EMR requires improved user interface, better integration with tools like Hive and Prometheus Grafana, and enhanced stability. Configuration complexity poses a challenge for users. Cost control and scalability need optimization. Initial startup is slow, and legacy version compatibility is problematic. Automation for cluster resizing and enhanced support are crucial. Improved monitoring, debugging, and web support are necessary. Users suggest more flexible features and improvements in CI/CD, MLOps, and data storage management.
- "There is room for improvement with respect to retries, handling the volume of data on S3 buckets, cluster provisioning, scaling, termination, security, and integration between services like S3, Glue, Lake Formation, and DynamoDB."
- "Spark jobs take longer on Amazon EMR compared to previous experiences."
- "The solution can become expensive if you are not careful."