Apache Hadoop Reviews

Name: Apache Hadoop
Brand: Apache
Rating: 4.0 (41 reviews)

Vendor: Apache

4.0 out of 5

41 reviews
89% willing to recommend

Leave a review

What is Apache Hadoop?

Apache Hadoop provides a scalable, cost-effective open-source platform capable of handling vast data volumes with features like HDFS, distributed processing, and high integration capabilities.

Get the Apache Hadoop Buyer's Guide and find out what your peers are saying about Apache Hadoop, Dell PowerStore, Teradata and more!

Apache Hadoop is the #8 ranked solution in top Data Warehouse solutions. PeerSpot users give Apache Hadoop an average rating of 8.0 out of 10. Apache Hadoop is most commonly compared to Dell PowerStore: Apache Hadoop vs Dell PowerStore. Apache Hadoop is popular among the large enterprise segment, accounting for 55% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 27% of all views.

Helped 900,644 peers since 2012

Featured Apache Hadoop reviews

Nick Rapoport

Financial Advisor at a financial services firm with 10,001+ employees

Hadoop was used for years, but there were problems since the people who originally set it up left the firm. The group that owned it later didn't have the technical resources to properly maintain it. Although there was nothing wrong with Hadoop itself, issues arose without proper management and upgrades.

Read full review

YuQing Ding

Principle Network and Database Engr at Parsons Corporation

The best features of Apache Hadoop are that we use it to analyze with AI, machine learning, and everything. The data replication feature of Apache Hadoop is notable, though I'm not that advanced in using it. I assess Apache Hadoop's fault tolerance during hardware failures positively since we have hardware failover, which works without problems. We have dual servers everywhere, and Apache Hadoop is installed in these dual servers. Apache Hadoop helps us in cases of hardware failure because it works 24/7, and sometimes servers crash in the field. Not all of the servers are in a data center, and Apache Hadoop helps us to failover and analyze. It's very powerful.

Read full review

Cheikh Mbengue

Database Administrator at Lacoste

Our use case is for a customer who wants to migrate their data warehouse to Hadoop. It's a request from a customer in Senegal who wants to migrate their Oracle data warehouse to Hadoop. I'm trying to migrate it to Hive or HBase. They're choosing between upgrading Oracle or moving to Cloudera…

Read full review

Apache Hadoop mindshare

As of June 2026, the mindshare of Apache Hadoop in the Data Warehouse category stands at 3.2%, down from 4.4% compared to the previous year, according to calculations based on PeerSpot user engagement data.

Data Warehouse Mindshare Distribution
Product	Mindshare (%)
Apache Hadoop	3.2%
Snowflake	9.3%
Teradata	8.7%
Other	78.8%

Data Warehouse

PeerResearch reports based on Apache Hadoop reviews

Type	Title	Date
Category	Data Warehouse	Jun 23, 2026	Download
Product	Reviews, tips, and advice from real users	Jun 23, 2026	Download
Comparison	Apache Hadoop vs Snowflake	Jun 23, 2026	Download
Comparison	Apache Hadoop vs Oracle Exadata	Jun 23, 2026	Download
Comparison	Apache Hadoop vs Teradata	Jun 23, 2026	Download

Valuable Features

Apache Hadoop's most valuable features include HDFS and its scalability. Users appreciate its distributed file system for storing diverse data types and its powerful processing capabilities. Its cost-effectiveness, open-source nature, and ability to handle large data volumes efficiently make it appealing. Users value integration with various tools and its support for massive parallel processing, providing flexibility and fault tolerance. Spark enhances analysis capabilities, while Ambari simplifies configuration and management tasks.

"Hadoop has also made it feasible to have all the data available in one area."
"The most important feature is its ability to handle large volumes, as some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume, so its processing power and speed are most valuable."
"Real-time streaming and integration using Spark streaming and the ecosystem of Spark technologies inside Hadoop."

Room for Improvement

Apache Hadoop requires improvements in user interface and user-friendliness due to its complex setup and steep learning curve. Its data replication could be optimized to reduce redundancy. Compatibility issues with components like Apache Oozie and performance challenges with processing speed and incremental data are noted. Users express a desire for enhanced security features, improved documentation, and advanced analytics tools. Technical support and community engagement require strengthening. Enterprise features like real-time processing and better integration are needed.

"Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products."
"In terms of processing speed, I believe that some of this software as well as the Hadoop-linked software can be better."
"We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues."

ROI

Apache Hadoop offers valuable storage and processing capabilities at a lower cost. Clients experience varying levels of return, largely dependent on how they leverage analytics. While calculating ROI is challenging, several users have benefited from significant cost savings by optimizing compute resources based on workload demands.

Pricing

Apache Hadoop pricing can vary significantly based on the distribution and deployment model. While it is open source and free, enterprise versions like Cloudera may involve licensing costs. Costs are influenced by data storage and compute needs; on-premises deployments are often more expensive than cloud solutions. Enterprises appreciate its scalability for large datasets, though medium and small businesses may find it costly. Users often choose cloud services from providers like Amazon to manage expenses effectively.

"The product is open-source, but some associated licensing fees depend on the subscription level."
"It's reasonable, but there's room for improvement in cost-effectiveness."
"For any big enterprise the costs can be handled, and it is suitable for big enterprises because the scale of data is large. For medium and small enterprises, the tool is on the high-price side."

Popular Use Cases

Organizations use Apache Hadoop for data aggregation, analytics, and storage. They create data lakes for streaming dashboards, integrate legacy systems, and manage enterprise data hubs. Apache Hadoop supports big data engineering, ETL processes, and real-time streaming using Spark. It handles large data volumes, unstructured data analysis, and serves as a data warehouse or hub. Additionally, it aids in content management, network monitoring, AntiFraud systems, and security for internal applications.

Service and Support

Customer service quality varies by Hadoop distributor, with Hortonworks rated highly. Apache Hadoop relies on community and forums for support, offering helpful documentation. Cloudera and other vendors provide premium support packages, helpful for complex issues. Many find online resources and forums sufficient for resolving problems. Some companies use external vendors or in-house teams, while official Apache support is limited. Technical support satisfaction varies, often reliant on distribution-specific services.

Deployment

Apache Hadoop's initial setup varies in complexity. Many users find it challenging without prior expertise. Tools like Ambari help simplify the process. Scaling nodes or integrating with additional tools adds complexity. Security, data migration, and precise configuration are common hurdles. Setup time ranges from days to months based on expertise and environment. Cloud deployments often ease the process. Maintenance typically requires skilled personnel to ensure stability and address unforeseen issues.

Scalability

Apache Hadoop is known for its scalability, handling large-scale data efficiently. Many users report it scales better in cloud environments than on-premises. It supports horizontal scaling, allowing easy addition of nodes for increased demands. Companies with diverse user bases, including data engineers and business users, find it flexible, supporting thousands of users and large data volumes. Its design allows efficient management, making it suitable for enterprises needing reliable data processing at scale.

Stability

Users report stable experiences with Apache Hadoop since transitioning from 1.x. Stability challenges arise mostly from startup procedures, insufficient memory, and configuration management. Many achieve high stability, citing its capability to handle large datasets. Versions consistently improve, enhancing reliability. Some encounter occasional issues during extensive processes or require careful management of components. These stability experiences are generally favorable, with most users rating it between seven and nine out of ten, noting improvements and adaptability as strengths.

These insights are based on the in-depth reviews provided by peers to help you make a better buying decision.

Download our Apache Hadoop Buyer's Guide for additional reliable information.

Review data by company size

By reviewers
Company Size	Count
Small Business	11
Midsize Enterprise	7
Large Enterprise	16

By reviewers

By visitors reading reviews
Company Size	Count
Small Business	77
Midsize Enterprise	42
Large Enterprise	147

By visitors reading reviews

Top industries

By visitors reading reviews

Financial Services Firm

27%

Construction Company

Outsourcing Company

Manufacturing Company

Computer Software Company

Comms Service Provider

University

Government

Performing Arts

Marketing Services Firm

Healthcare Company

Energy/Utilities Company

Transportation Company

Real Estate/Law Firm

Recreational Facilities/Services Company

Wholesaler/Distributor

Educational Organization

Media Company

Non Profit

Retailer

Hospitality Company

Logistics Company

Renewables & Environment Company

Insurance Company

Compare Apache Hadoop with alternative products

Learn more about Apache Hadoop

Apache Hadoop is known for its distributed file system HDFS, which supports large data volumes efficiently. Its open-source nature allows cost-effective scalability and compatibility with tools like Spark for enhanced analytics. While it offers significant processing power, areas for improvement include user-friendliness, interface design, security measures, and real-time data handling. Users benefit from data storage for structured and unstructured data, facilitated by its distributed processing architecture. Data replication ensures fault tolerance, while its capability to integrate with tools like Apache Atlas and Talend highlights its versatility.

What are the key features of Apache Hadoop?

HDFS: Distributed file system managing vast data scales.
Scalability: Seamlessly add nodes to increase capacity.
Integration: Works with tools such as Spark to enhance analytics.
Data Replication: Ensures fault tolerance and reliability.
Low Latency: Offers high throughput for efficient processing.

Why choose Apache Hadoop based on user reviews?

Cost-Effectiveness: Open-source framework reduces expenses.
Data Versatility: Supports multiple data types and sources.
Processing Power: Efficiently handles distributed data processing.
High Storage Capacity: Manages large data volumes efficiently.
Expandability: Easily integrates with numerous data processing tools.

Industries leverage Apache Hadoop for Big Data analytics, data lakes, ETL tasks, and enterprise data hubs, handling unstructured and structured data from IoT, RDBMS, and real-time streams. Its applications extend to data warehousing, AI/ML projects, and data migration, employing tools like Apache Ranger, Hive, and Talend for effective data management and analysis.

Apache Hadoop customers

Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab

Product Categories

Data Warehouse

Popular Comparisons

Dell PowerStore vs Apache Hadoop

Teradata vs Apache Hadoop

Snowflake vs Apache Hadoop

VMware Tanzu Data Solutions vs Apache Hadoop

Oracle Exadata vs Apache Hadoop

OpenText Analytics Database (Vertica) vs Apache Hadoop

Amazon Redshift vs Apache Hadoop

IBM Netezza Performance Server vs Apache Hadoop

SAP IQ vs Apache Hadoop

Oracle Database Appliance vs Apache Hadoop

Actian Ingres vs Apache Hadoop

SAP BW4HANA vs Apache Hadoop

IBM Db2 Warehouse vs Apache Hadoop

Microsoft Parallel Data Warehouse vs Apache Hadoop

Infobright DB vs Apache Hadoop

See all alternatives

Apache Hadoop Reviews Summary
Author info	Rating	Review Summary
Financial Advisor at a financial services firm with 10,001+ employees	4.0	We had a limited on-premises deployment of Apache Hadoop which scaled well and was reliable, but maintaining it was challenging due to a lack of resources and expertise after the original team left. We prefer solutions with structured support.
Principle Network and Database Engr at Parsons Corporation	4.5	I use Apache Hadoop daily to analyze unstructured incident data, benefiting from its AI and machine learning capabilities, strong failover support, and fault tolerance, especially useful in our dual-server setup and field environments prone to hardware failures.
Database Administrator at Lacoste	4.5	I am working on migrating a customer's data warehouse from Oracle to Hadoop, specifically considering Cloudera, due to its data warehouse capabilities and scalability. The integration of reporting tools like Power BI with Hadoop poses some challenges.
Software Development Consultant at Synechron	4.0	I use Apache Hadoop in my company for its efficient analytical processing and organized data distribution. However, it struggles with incremental data processing and has high licensing costs. Setup and technical support need improvement for better user experience and resolution times.
IT Support Specialist at Convergys Corporation	4.5	We used Apache Hadoop mainly for data analysis and storage, benefiting from its low cost, open-source nature, and efficient performance on commodity hardware. While its flexibility and resilience are advantageous, improved security measures would enhance its capabilities.
Head of Data at a energy/utilities company with 51-200 employees	4.5	Apache Hadoop's distributed computing capability efficiently accelerates data processing by distributing tasks across multiple nodes. While it offers cost savings on compute resources through optimization, the availability of comprehensive training materials could be improved to enhance onboarding and skill development.
Senior Assosiate Consultant at Applied Materials	3.0	I use Apache Hadoop for data storage and report generation, appreciating its open-source nature, ability to handle large data volumes, and effectiveness in data processing and storage. However, limited support requires deeper knowledge and improvisation.
Head Of Data Governance at Alibaba Group	4.0	We use the Hadoop File System for big data due to its open-source nature and cost-effectiveness. However, dealing with data skewness requires custom solutions, unlike Spark, which is more efficient and faster due to in-memory processing.
Senior Data Archirect at Yettel	4.0	I've been using Apache Hadoop for data collection, finding it effective for file storage and maintenance, though its stability needs improvement. Calculating ROI is difficult, and while we explored Azure, Hadoop proved more acceptable for our needs.
Manager at Robi Axiata Limited	3.5	I use Apache Hadoop for big data analysis and AI/ML, valuing its processing for newer technologies. However, I find its user-friendliness lacking, as GUI changes require advanced coding. Stability is good, and setup was easy.

Nick Rapoport

Financial Advisor at a financial services firm with 10,001+ employees

Mar 27, 2025

Reliable performance maintained but requires ongoing management and support

What is our primary use case?

We had some Apache Hadoop, but very little. It was an on-premises deployment of Apache Hadoop, which we are trying to get away from. It was holding it just fine. We were running all the data that we now have in ADLS was in Apache Hadoop.

How has it helped my organization?

What is most valuable?

Apache Hadoop was reliable. However, you can't stop managing it. If you don't do the upgrades, the platform ages out, and that's what happened to the Hadoop content. It scaled well. Hadoop is a distributed file system, and it scales reasonably well provided you give it sufficient resources.

What needs improvement?

The problem with Apache Hadoop arose when the guys that originally set it up left the firm, and the group that later owned it didn't have enough technical resources to properly maintain it. This was part of the problem, apart from just aging out due to lack of resources.

For how long have I used the solution?

I don't have a complete history since the deployment, but I have not had extensive experience with it.

What was my experience with deployment of the solution?

Hadoop was initially set up well. However, when the team managing it shifted their project funding away from Hadoop, the maintenance and upgrades ceased, leading to aging.

What do I think about the stability of the solution?

Apache Hadoop was reliable. However, continuous management in the way of upgrades and technical management is necessary to ensure that it remains effective. The setup continued working, but the lack of management resulted in the platform aging out.

What do I think about the scalability of the solution?

Hadoop scaled well. It is a distributed file system and scales reasonably well as long as it is given sufficient resources. We had all our data now in ADLS in Apache Hadoop and it handled it well.

How are customer service and support?

I have not had to contact Apache for support for Hadoop. Being open source, support consists of community responses that are if someone wants to address your issue, they can. It's not structured support, which is why we don't use purely open-source projects without additional structured support.

How would you rate customer service and support?

Neutral

How was the initial setup?

While I have not set up Hadoop here, I did set it up at home when I was first experimenting with it on two to three nodes. It was very straightforward and not difficult at all.

Which other solutions did I evaluate?

We're a bank, so we prefer solutions with reliable support over open-source projects without structured support.

What other advice do I have?

When I've dealt with open-source technologies, it's crucial that there's a mechanism for issue escalation. Structured support is invaluable, which is why I appreciate having technical account managers I can contact. I would rate Apache Hadoop a seven to eight.

Which deployment model are you using for this solution?

On-premises

YuQing Ding

Principle Network and Database Engr at Parsons Corporation

Aug 22, 2025

Data analysis becomes more reliable with enhanced incident handling and system failover

What is our primary use case?

My use cases for Apache Hadoop include the setups I completed, connecting to the database, and analyzing the incidences, making it a good tool for Hadoop. Apache Hadoop helps us analyze all of the accidents escalated to a higher level.

I use Apache Hadoop for analyzing unstructured data because we have numerous incident reports. It actually uses Mango, I remember.

I rely on Apache Hadoop every day for data analyzing, and it helps with failover.

What is most valuable?

The best features of Apache Hadoop are that we use it to analyze with AI, machine learning, and everything.

The data replication feature of Apache Hadoop is notable, though I'm not that advanced in using it.

I assess Apache Hadoop's fault tolerance during hardware failures positively since we have hardware failover, which works without problems. We have dual servers everywhere, and Apache Hadoop is installed in these dual servers.

Apache Hadoop helps us in cases of hardware failure because it works 24/7, and sometimes servers crash in the field. Not all of the servers are in a data center, and Apache Hadoop helps us to failover and analyze. It's very powerful.

Cheikh Mbengue

Database Administrator at Lacoste

Jul 1, 2024

Good fit for telecom sector, and has good data warehousing features

What is our primary use case?

They're choosing between upgrading Oracle or moving to Cloudera Hadoop. They seem to prefer Cloudera.

The current data warehouse runs on Oracle DB, but we have to migrate the analytics process to Hadoop.

What is most valuable?

My customers like the HDFS and the data warehouse capabilities within Hadoop.

They have integrated other tools as well, like Power BI and Oracle BI, both on Azure, for reporting. Oracle BI is difficult to integrate.

It is difficult to integrate them with Hadoop.

For how long have I used the solution?

I have a little experience with it. I'm a database expert, not a Hadoop expert, so I haven't worked extensively with it.

What do I think about the scalability of the solution?

If the customer chooses it, the solution must be scalable. They're a telecom company and need a high-level solution.

Hadoop is quite scalable, and that's the main requirement. They want a solution that can handle 100 terabytes for the first step.

How are customer service and support?

I contacted only an integrator who works on the design process.

Which solution did I use previously and why did I switch?

Oracle doesn't provide Big Data appliances now.

How was the initial setup?

The client doesn't have experience with Hadoop. If they choose this solution, there will be training.

The integrator will provide a training program.

It took three months for migration.

What's my experience with pricing, setup cost, and licensing?

The solution is expensive, but that's for the customer to decide.

We already have the cost of the materials and the license, but we can compare pricing with other integrators and can't see any specific difference.

It's an annual license. The customer is part of a larger company that already uses Hadoop. They know what they want and have a use case in mind, which is why they asked us for a proposal. We just give the pricing and information.

Which other solutions did I evaluate?

The customer is comparing it to Orange, another telecom operator in Senegal, who uses Hadoop successfully.

The main reason to chose Apache is the comparison to Orange, and that Hadoop is very scalable.

What other advice do I have?

It's a good solution. Other companies have used it for over ten years with great success.

For the telecom sector, I would give it a nine out of ten.

I recommend it for the telecom sector. I know it well, and it's a good fit.

SushilArya

Software Development Consultant at Synechron

Jun 25, 2024

Provides ease of integration with the IT workflow of a business

What is our primary use case?

I use the solution in my company since it makes the analytical processing easy. It takes data into one cluster and then processes it. While working on any GPU whatever the analytics are, and what I get as insight from the data, I can say that processing is very fast.

What is most valuable?

The main features of the tool are the distribution, how it makes data clusters, and what the data is, which are all very organized in Apache Hadoop.

What needs improvement?

When working with Kafka, I saw that the data came in an incremental order. The incremental data processing part is still not very effective in Apache Hadoop. If the data is already there, it can be processed very effectively, especially if the data is coming in every second. If you want to know the location of some data every second, then such data is not processed effectively in Apache Hadoop.

I can say that one of the features where improvements are required revolves around the licensing cost of the tool. If the tool can build some licensing structures in a pay-per-use manner, organizations can get the look and feel of Apache Hadoop. Apache Hadoop can offer a licensing structure of the product that can be seen as similar to how AWS operates. Apache Hadoop can look into the capability of processing incremental data.

The tool's setup process can be a scope of improvement. Also, it is not very simple because while doing the setup, we need to do all the server settings, including port listing and firewall configurations. If we look at other products on the market, then they can be made simpler.

There are certain shortcomings when it comes to the product's technical support part, making it an area where improvements are required.

The time frame for the resolution is an area that needs to be improved. The overall communication part of the technical support team also needs improvement.

For how long have I used the solution?

I have been using Apache Hadoop for three years.

What do I think about the stability of the solution?

Stability-wise, whenever there is a sale like a big billion days that happens, the tool remains stable. I rate the solution's stability an eight out of ten.

What do I think about the scalability of the solution?

It is a reliable product. So even for any business, if it is scaled up, then it can manage the data, or it can manage the processing. Scalability-wise, I rate the solution an eight out of ten.

The product is suitable for enterprise-sized companies with more than 1,000 employees.

How are customer service and support?

I rate the technical support a seven out of ten.

How would you rate customer service and support?

Neutral

How was the initial setup?

The product's initial setup phase is not very simple.

I required one DevOps engineer for the product's installation phase.

The solution is deployed on the cloud.

For configuration, we take about three days.

What's my experience with pricing, setup cost, and licensing?

For any big enterprise the costs can be handled, and it is suitable for big enterprises because the scale of data is large. For medium and small enterprises, the tool is on the high-price side.

What other advice do I have?

It is easy to integrate Hadoop with your IT workflow since I am using the cloud version only. In the cloud, I could install or put my data, but I did not have to get everything installed on my machine. The cloud can be accessed from anywhere, making it a really valuable tool.

I am building a GenAI model, and so, in our company, we are making a data pipeline that we have integrated into Apache Kafka. Earlier, we also included Apache Hadoop in the process. We have integrated the product into some AI tools. Whatever data is getting processed is coming out, and new data is coming in and getting processed. We have integrated the tool into some LLM models.

I will recommend the tool to any enterprise company that has an employee strength of more than 1,000.

I rate the tool an eight out of ten.

Kenechukwu Murphy Ezeoka

IT Support Specialist at Convergys Corporation

Sep 4, 2024

Enables efficient data warehousing and supports a large ecosystem of tools

What is our primary use case?

We used the product primarily for data analysis and storage. It helps handle large data sets, performing tasks like filtering, sorting, and joining. The platform is useful for data warehousing and provides distributed coordination and synchronization functionalities.

How has it helped my organization?

The solution has effectively supported our operations primarily due to its cost efficiency. It enables us to manage large data sets without incurring excessive subscription costs, resulting in more efficient data handling and operations.

What is most valuable?

The platform's most valuable feature is its low cost and open-source nature. It runs efficiently on commodity hardware and supports a large ecosystem of tools. Its flexibility in handling and storing large volumes of data is particularly beneficial, as is its resilience, which ensures data redundancy and fault tolerance.

What needs improvement?

Improvements in security measures would be beneficial, given the large volumes of data handled. Robust security features are essential to prevent data leaks or breaches. Additionally, integrating advanced capabilities similar to those other solutions would enhance the platform's functionality.

For how long have I used the solution?

I have worked with Apache Hadoop for about six to seven months. The duration varied based on the projects I was involved in, as we often switched to different projects with different applications.

What do I think about the stability of the solution?

Although I have encountered some performance issues, the platform has proven to be stable.

I would rate its stability as eight or nine.

What do I think about the scalability of the solution?

This platform's scalability significantly impacts data management capabilities. It allows for simultaneously handling large data volumes, which other applications might struggle with.

How are customer service and support?

I have contacted Apache tech support when encountering issues that could not be resolved internally. Their support is reliable and responsive.

How would you rate customer service and support?

Positive

How was the initial setup?

The initial setup can be complex due to the need for precise coding and configuration. Setting up the required components and ensuring all dependencies are correctly configured is crucial for a successful deployment. Depending on network capability and system specifications, the setup typically takes 30 minutes to one hour if prerequisites are met. Maintenance involves regular updates to ensure the platform runs with the latest features and security patches.

What's my experience with pricing, setup cost, and licensing?

The product is open-source, but some associated licensing fees depend on the subscription level. While it might be free for students, organizations typically need to pay for their subscriptions. The fees were reasonable for my usage, though I am not aware of recent changes to the pricing.

What other advice do I have?

The product is highly effective for processing and managing large data sets. Integrating it with other solutions like AWS can provide additional functionalities, but the cost benefits of using this platform remain significant. I have also used the solution in AI-driven projects with machine learning models, and its integration with Apache Spark has been advantageous. I recommend it to organizations needing to handle large data sets due to its cost-effectiveness and robust capabilities.

I rate it a nine out of ten.

Madhan Potluri

Head of Data at a energy/utilities company with 51-200 employees

Jul 9, 2024

Distributes data processing tasks across multiple nodes and has a straightforward setup process

What is most valuable?

The product's distributed computing capability is the most effective. It allows us to distribute data processing tasks across multiple nodes, significantly speeding up processing time.

What needs improvement?

The product's availability of comprehensive training materials could be improved for faster onboarding and skill development among team members.

For how long have I used the solution?

I've been working with Apache Hadoop for about four years now.

What do I think about the stability of the solution?

We encounter occasional issues like memory constraints during extensive data processing. They have been manageable through scaling adjustments.

I rate the stability an eight or nine.

What do I think about the scalability of the solution?

Hadoop's scalability can be rated a nine out of ten due to its exceptional flexibility, allowing horizontal scaling by adding or removing nodes as required.

How are customer service and support?

Technical support has been decent, although there can be delays in resolving new or complex issues.

How would you rate customer service and support?

Neutral

How was the initial setup?

The product deployment process was quite straightforward, taking only a few minutes for installation.

What was our ROI?

The platform has resulted in significant cost savings on compute resources by optimizing usage based on workload demands.

What's my experience with pricing, setup cost, and licensing?

I would rate the product's subscription-based pricing a six out of ten. It's reasonable, but there's room for improvement in cost-effectiveness.

What other advice do I have?

The platform's quick data processing capabilities have been instrumental in supporting our AI-driven projects. I would recommend it, especially for organizations dealing with large-scale data processing and needing robust distributed computing capabilities.

I would rate Apache Hadoop an eight out of ten.

Akhilesh Chipre

Senior Assosiate Consultant at Applied Materials

Apr 11, 2024

Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge

What is our primary use case?

We use it to store data. Our team then takes this data to create reports on top of that.

How has it helped my organization?

We primarily use Kafka for intensive data streaming. For batch-based processing, we use Hadoop. Additionally, we have our own custom batch catalog that likely helps prepare data for further analysis or use.

We have many projects where our main data storage is done in Hadoop only. All projects take data from Hadoop to provide data insights and reports.

Hadoop YARN for resource management is a really good aspect. It is is very good for managing large data volumes. It allows us to monitor data processing effectively. We can see how much data there is, the consumption of RAM or ROM, and how resources are allocated. It's good for managing and previewing the scale of data processing.

What is most valuable?

It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.

Its ability to handle open data access is significant, and the support is substantial, though not as responsive as one might hope. It's very effective for data processing and storage.

Overall, it's very good for data processing and storage.

What needs improvement?

Since it is an open-source product, there won't be much support.

So, you have to have deeper knowledge. You need to improvise based on that.

For how long have I used the solution?

I have been using it for five years.

What do I think about the stability of the solution?

It is a pretty stable product.

What do I think about the scalability of the solution?

It is easy to scale. Almost everyone uses it, so there are 1000 end users.

How are customer service and support?

It is open source. There is no support.

How would you rate customer service and support?

Neutral

How was the initial setup?

The installation is a bit complex. We have to have a very good knowledge about the product.

The deployment took around ten hours.

What about the implementation team?

We needed two to three architects. There were support people also.

What other advice do I have?

Hadoop is a good database, and it's open-source, which makes it cost-effective. But, if you have a large budget and allocations, you can go with products that have advanced analytics tools or other extra features, like data lineage.

Overall, I would rate the solution a seven out of ten.

Syed Afroz Pasha

Head Of Data Governance at Alibaba Group

Feb 27, 2024

Compatible with almost all the query engines and is open-sourced

What is our primary use case?

We use the Hadoop File System. We usually keep the data for our tables or big data on it. Hadoop has a query engine called Hive. We write SQL queries, and the tool usually processes in a parallel environment and gets us the data on Hive.

What is most valuable?

Hadoop File System is a perfect choice if we want to use any database systems or file systems because it is open-source. It has no cost. Or else, we’ll have to use Amazon S3 or Azure database, for which we will have to pay a lot. A lot of big data processing needs a proper partition and structure. Hadoop File System is compatible with almost all the query engines. That’s another reason why people would be very comfortable working with the Hadoop ecosystem.

What needs improvement?

The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with data skewness do not make any sense. We usually have to deal with it using a custom solution.

Spark would deal with such cases efficiently. If Hadoop solves the issues the way Spark does, it can compete with Spark at the same level. Hive is a little slower than Spark. Spark is in-memory and parallel processing. Hive is not in-memory, but it is parallel processing.

For how long have I used the solution?

I have been using the solution for three and a half years.

What do I think about the scalability of the solution?

Around 70 to 80 people use the product in our organization.

How are customer service and support?

We can easily find support for Apache products. Support is a positive aspect of the solution.

How was the initial setup?

The installation is a little difficult because it is an open-source tool. It is similar to Apache Spark. The product is not self-manageable. We will have to invest a little in the setup.

What other advice do I have?

People who want to buy the solution must hire or work with someone who understands the architecture as per the use case. It should be good for the long run. Once Hadoop is set up, we can change the configuration, but the architecture cannot be changed frequently. We must invest more in the architecture. Once properly built, we can build or develop anything on it. Architecture is important for Hadoop. If the product is set up well, we will not find difficulties later. Overall, I rate the solution an eight out of ten.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Alibaba

Miodrag Milojevic

Senior Data Archirect at Yettel

Aug 1, 2023

A file system for data collection that contains needed information and files

What is our primary use case?

I have been using the latest version of Apache Hadoop. It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.

What needs improvement?

Hadoop isn't so problematic. It deals with file storage and maintenance. It is a network of file operations.

The stability of the solution needs improvement.

For how long have I used the solution?

I have been using Apache Hadoop for more than three to four years.

What do I think about the stability of the solution?

There are some issues with file retention and its stability but they can be worked through. There are a lot of things that are based on disk space that require the preparation of different and sophisticated controls. The software itself is not unstable, but sometimes its options can cause stability issues.

What do I think about the scalability of the solution?

The scalability includes adding nodes and it is not so easy to do. It is a detailed process that requires precision.

There are almost 25 users, including data engineers and others, but no specialists. We plan to increase endpoint users and introduce running reports, automated reports, or reports based on some tools.

How are customer service and support?

Apache is an open source software and only has a community, instead of customer support. There is Cloudera which provides Apache Hadoop on license and offers support. In Cloudera, there are some consultants with less knowledge who offer support for small issues. As the case escalates, they provide more support with better technical expertise.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We checked a few solutions and tried solutions from Azure. There were pros and cons but this solution was more acceptable.

How was the initial setup?

The setup depends on the data. Vast data can be hard to set up. You might have some issues with the setup, but it depends on the number of nodes. More nodes can cause issues and more time to resolve. The reshuffling is also complex and can cause problems.

The on-premise setup can be difficult as it requires the subsequent setup of nodes while expanding. Cloud deployment can be easier but only supports other software.

What was our ROI?

The ROI is very hard to calculate. The source of data for the company can help to run different technologies and make many decisions based on the data, but it's very hard to calculate the return on investment.

What's my experience with pricing, setup cost, and licensing?

I am not updated with the licensing cost, but you need to pay for a license if purchased from Cloudera.

What other advice do I have?

If you plan to use Apache Hadoop, purchase the license from Cloudera because they provide you with technical support.

I rate the overall solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises

Juliet Hoimonthi

Manager at Robi Axiata Limited

Jul 5, 2022

Has good analysis and processing features for AI/ML use cases, but isn't as user-friendly and requires an advanced level of coding or programming

What is our primary use case?

I'm from the data governance team, and this is how my team uses Apache Hadoop: there's a GUI called Apache Atlas, then there's an option called the "business glossary". My team uses the business glossary from Apache Atlas and also uses Apache Ranger. Apache Ranger is another GUI where you can check who is using which data source through the Apache Hadoop platform. My team also uses the Apache Hadoop platform for AI-related use cases and relevant data, so the data required from any kind of AI use case, that data is processed with ETL, specifically with the Talend tool. My team then loads the data in Apache Hadoop, uses that data by making some clusters, and uses the data for AI/ML cases.

What is most valuable?

What I like about Apache Hadoop is that it's for big data, in particular big data analysis, and it's the easier solution. I like the data processing feature for AI/ML use cases the most because some solutions allow me to collect data from relational databases, while Hadoop provides me with more options for newer technologies.

What needs improvement?

What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.

What do I think about the stability of the solution?

Apache Hadoop has good stability.

What do I think about the scalability of the solution?

I'm not sure how scalable Apache Hadoop is.

How are customer service and support?

In terms of technical support from Apache Hadoop, we are working with an external vendor and they are the ones helping us in every case. They are helpful.

Which solution did I use previously and why did I switch?

We used Oracle Exadata before using Apache Hadoop. It was one or two years ago when we started using the Apache Hadoop platform. We're still thinking about using both platforms in parallel or choosing one of the two. We're still looking into the benefits of each platform, but currently, we're using both Oracle Exadata and Apache Hadoop.

How was the initial setup?

I wasn't part of the team that set up Apache Hadoop, but using it after it was set up was very easy. The solution was ready immediately, and the GUI was smooth and fast, with no issues.

What about the implementation team?

Apache Hadoop was implemented by the IT team, so it was an in-house implementation.

What's my experience with pricing, setup cost, and licensing?

If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs.

What other advice do I have?

My company is using both Apache Hadoop and Oracle Exadata.

I'm unsure which version of Apache Hadoop I'm using, but it could be the latest version.

Currently, the solution is deployed on-premises because here in Bangladesh, there's a limitation with transferring data outside of the country. As far as I know, there's no cloud solution internally in Bangladesh, so if you want to use a cloud solution here, you'll have to move your data outside Bangladesh, and this is why Apache Hadoop is still deployed on-premises.

More than fifty people use Apache Hadoop directly, particularly the IT and analytics expert teams. The solution is being used by developers, people in operations, and people who maintain security.

In my company, Apache Hadoop is not fully implemented yet. It's still in the implementation phase and at least for the next two to three years, there isn't any plan of discarding it.

I'm giving Apache Hadoop a rating of seven out of ten.

I don't have any recommendations currently for people who want to implement Apache Hadoop because I'm still in the learning phase and I don't have much knowledge yet. The IT team in my company is also struggling every time in terms of preparing everything and still needs help from external vendors because the team isn't an expert on Apache Hadoop yet. My company's expertise is in Oracle Exadata because usage of that product started in 2002 or 2003.

My company is a customer of Apache Hadoop.

Which deployment model are you using for this solution?

On-premises

Title	Rating	Mindshare	Recommending
Dell PowerStore	4.4	1.4%	97%	220 interviews Add to research
Teradata	4.1	8.7%	88%	83 interviews Add to research

Apache Hadoop Reviews

What is Apache Hadoop?

Featured Apache Hadoop reviews

Apache Hadoop mindshare

PeerResearch reports based on Apache Hadoop reviews

Valuable Features

Room for Improvement

ROI

Pricing

Popular Use Cases

Service and Support

Deployment

Scalability

Stability

Review data by company size

Top industries

Compare Apache Hadoop with alternative products

Learn more about Apache Hadoop

Apache Hadoop customers

Related questions

Product Categories

Popular Comparisons

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What was my experience with deployment of the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

Which other solutions did I evaluate?

What other advice do I have?

Which deployment model are you using for this solution?

What is our primary use case?

What is most valuable?

What is our primary use case?

What is most valuable?

For how long have I used the solution?

What do I think about the scalability of the solution?

How are customer service and support?

Which solution did I use previously and why did I switch?

How was the initial setup?

What's my experience with pricing, setup cost, and licensing?

Which other solutions did I evaluate?

What other advice do I have?

What is our primary use case?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?

What is our primary use case?

How has it helped my organization?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What's my experience with pricing, setup cost, and licensing?

What other advice do I have?

What is most valuable?

What needs improvement?

For how long have I used the solution?

What do I think about the stability of the solution?

What do I think about the scalability of the solution?

How are customer service and support?

How would you rate customer service and support?

How was the initial setup?

What was our ROI?