Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Pentaho Business Analytics comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
66
Ranking in other categories
Hadoop (1st), Compute Service (5th), Java Frameworks (2nd)
Pentaho Business Analytics
Average Rating
8.0
Reviews Sentiment
6.8
Number of Reviews
44
Ranking in other categories
BI (Business Intelligence) Tools (20th), Cloud Operations Analytics (4th), Reporting (16th)
 

Mindshare comparison

Apache Spark and Pentaho Business Analytics aren’t in the same category and serve different purposes. Apache Spark is designed for Hadoop and holds a mindshare of 17.8%, down 21.4% compared to last year.
Pentaho Business Analytics, on the other hand, focuses on BI (Business Intelligence) Tools, holds 0.5% mindshare, down 0.6% since last year.
Hadoop
BI (Business Intelligence) Tools
 

Featured Reviews

Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.
Sayan König - PeerSpot reviewer
Flexible, easy to understand, and simple to set up
The repository should be improved. There should be the possibility to have versioning, to make it combinable with some Git repositories or something like that, to check out the processes and make sure it has a traceable history. The solution could really be improved. There are too many bugs in our version.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The solution is very stable."
"The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast."
"The main feature that we find valuable is that it is very fast."
"I feel the streaming is its best feature."
"I like Apache Spark's flexibility the most. Before, we had one server that would choke up. With the solution, we can easily add more nodes when needed. The machine learning models are also really helpful. We use them to predict energy theft and find infrastructure problems."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"The data processing framework is good."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"The most valuable feature of Pentaho is the Tableau report."
"The initial setup is pretty straightforward."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"It is robust, offers market intelligence, and utilizes modules effectively."
"We were able to install it without any assistance from tech support."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"The interface of Pentaho Business Analytics is easy to use and does not require high-level skills to operate the tool."
 

Cons

"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"There were some problems related to the product's compatibility with a few Python libraries."
"For improvement, I think the tool could make things easier for people who aren't very technical. There's a significant learning curve, and I've seen organizations give up because of it. Making it quicker or easier for non-technical people would be beneficial."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"Apache Spark lacks geospatial data."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"It should support more programming languages."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"Pentaho Business Analytics' user interface is outdated."
"Pentaho Business Analytics is hard to learn and not suited for initial users as it requires knowledge of operating systems, Java, and other technical skills."
"Another concern is that Pentaho is not customizable or interactive."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"The repository should be improved."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"The tool is very good, and yet it has some problems as it relies heavily on Java."
"Version control would be a good addition."
 

Pricing and Cost Advice

"Spark is an open-source solution, so there are no licensing costs."
"Licensing costs can vary. For instance, when purchasing a virtual machine, you're asked if you want to take advantage of the hybrid benefit or if you prefer the license costs to be included upfront by the cloud service provider, such as Azure. If you choose the hybrid benefit, it indicates you already possess a license for the operating system and wish to avoid additional charges for that specific VM in Azure. This approach allows for a reduction in licensing costs, charging only for the service and associated resources."
"Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"It is an open-source platform. We do not pay for its subscription."
"The solution is affordable and there are no additional licensing costs."
"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"It is an open-source solution, it is free of charge."
"Pentaho is expensive ."
"We were lucky enough to find a Pentaho OEM partner who offered a data warehouse model and the ETL software for about 60K SGD per year."
"Free and commercial versions are available."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
851,604 professionals have used our research since 2012.
 

Comparison Review

it_user6978 - PeerSpot reviewer
Jun 10, 2013
Jaspersoft vs. Pentaho – Which one to use & is there any need to purchase the commercial edition
Any company (be it technology, manfucaturing, human resource, ecommerce, SME etc) always has the need for Business Intelligence to some or the other extent. If cost is one of the consideration factor, then the 2 BI tools which are at the forefront are Pentaho and Jaspersoft. But, often the same…
 

Top Industries

By visitors reading reviews
Financial Services Firm
26%
Computer Software Company
13%
Manufacturing Company
8%
Comms Service Provider
6%
Financial Services Firm
21%
Computer Software Company
15%
Educational Organization
7%
Real Estate/Law Firm
6%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Apache Spark is open-source, so it doesn't incur any charges.
What needs improvement with Apache Spark?
There is complexity when it comes to understanding the whole ecosystem, especially for beginners. I find it quite complex to understand how a Spark job is initiated, the roles of driver nodes, work...
Seeking lightweight open source BI software
There are many...It would rather depend what System BI architecture or Enterprise legacy you have at your end...I would recommend as follows: 1) If you have legacies of SAP, Oracle - look for SAP...
What is your experience regarding pricing and costs for Pentaho Business Analytics?
Pentaho Business Analytics is priced similarly to other competitors such as QlikView ( /products/qlikview-reviews ) and Tableau ( /products/tableau-reviews ). I usually use the community edition.
What needs improvement with Pentaho Business Analytics?
Pentaho Business Analytics ( /categories/bi-business-intelligence-tools ) is hard to learn and not suited for initial users as it requires knowledge of operating systems, Java, and other technical ...
 

Also Known As

No data available
Pentaho, Kettle, Hitachi Pentaho Business Analytics
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Cargo 2000 Lufthansa, Marketo, ModCloth, Cardiac Science, Telefonica, ExactTarget, Active Broadband Networks, and Brussels Airport.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: May 2025.
851,604 professionals have used our research since 2012.