Try our new research platform with insights from 80,000+ expert users

Apache Spark vs Pentaho Business Analytics comparison

 

Comparison Buyer's Guide

Executive Summary

Review summaries and opinions

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Categories and Ranking

Apache Spark
Average Rating
8.4
Reviews Sentiment
7.7
Number of Reviews
65
Ranking in other categories
Hadoop (1st), Compute Service (4th), Java Frameworks (2nd)
Pentaho Business Analytics
Average Rating
8.0
Reviews Sentiment
6.8
Number of Reviews
44
Ranking in other categories
BI (Business Intelligence) Tools (20th), Cloud Operations Analytics (4th), Reporting (16th)
 

Mindshare comparison

Apache Spark and Pentaho Business Analytics aren’t in the same category and serve different purposes. Apache Spark is designed for Hadoop and holds a mindshare of 17.5%, down 21.4% compared to last year.
Pentaho Business Analytics, on the other hand, focuses on BI (Business Intelligence) Tools, holds 0.5% mindshare, down 0.6% since last year.
Hadoop
BI (Business Intelligence) Tools
 

Featured Reviews

Ilya Afanasyev - PeerSpot reviewer
Reliable, able to expand, and handle large amounts of data well
We use batch processing. It works well with our formats and file versions. There's a lot of functionality. In our pipeline each hour, we make a copy of data from MongoDB, of the changes from MongoDB to some specific file. Each time pipeline copied all of the data, it would do it each time without changes to all of the tables. Tables have a lot of data, and in the last MongoDB version, there is a possibility to read only changed data. This reduced the cost and configuration of the cluster, and we saved about $150,000. The solution is scalable. It's a stable product.
Sayan König - PeerSpot reviewer
Flexible, easy to understand, and simple to set up
The repository should be improved. There should be the possibility to have versioning, to make it combinable with some Git repositories or something like that, to check out the processes and make sure it has a traceable history. The solution could really be improved. There are too many bugs in our version.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The product's deployment phase is easy."
"ETL and streaming capabilities."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"Features include machine learning, real time streaming, and data processing."
"The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly."
"The solution is scalable."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"One of Apache Spark's most valuable features is that it supports in-memory processing, the execution of jobs compared to traditional tools is very fast."
"The initial setup is pretty straightforward."
"The most valuable feature of Pentaho is the Tableau report."
"The interface of Pentaho Business Analytics is easy to use and does not require high-level skills to operate the tool."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"It is robust, offers market intelligence, and utilizes modules effectively."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"Easy to use components to create the job."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
 

Cons

"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"Apache Spark lacks geospatial data."
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance."
"The setup I worked on was really complex."
"It would be beneficial to enhance Spark's capabilities by incorporating models that utilize features not traditionally present in its framework."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"The solution’s integration with other platforms should be improved."
"Version control would be a good addition."
"Pentaho Business Analytics is hard to learn and not suited for initial users as it requires knowledge of operating systems, Java, and other technical skills."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"The repository should be improved."
"The tool is very good, and yet it has some problems as it relies heavily on Java."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Another concern is that Pentaho is not customizable or interactive."
"Pentaho Business Analytics' user interface is outdated."
 

Pricing and Cost Advice

"Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
"It is quite expensive. In fact, it accounts for almost 50% of the cost of our entire project."
"The tool is an open-source product. If you're using the open-source Apache Spark, no fees are involved at any time. Charges only come into play when using it with other services like Databricks."
"Apache Spark is an expensive solution."
"It is an open-source platform. We do not pay for its subscription."
"We are using the free version of the solution."
"On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
"Considering the product version used in my company, I feel that the tool is not costly since the product is available for free."
"Pentaho is expensive ."
"Free and commercial versions are available."
"We were lucky enough to find a Pentaho OEM partner who offered a data warehouse model and the ETL software for about 60K SGD per year."
report
Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
845,406 professionals have used our research since 2012.
 

Comparison Review

it_user6978 - PeerSpot reviewer
Jun 10, 2013
Jaspersoft vs. Pentaho – Which one to use & is there any need to purchase the commercial edition
Any company (be it technology, manfucaturing, human resource, ecommerce, SME etc) always has the need for Business Intelligence to some or the other extent. If cost is one of the consideration factor, then the 2 BI tools which are at the forefront are Pentaho and Jaspersoft. But, often the same…
 

Top Industries

By visitors reading reviews
Financial Services Firm
28%
Computer Software Company
13%
Manufacturing Company
8%
Comms Service Provider
5%
Financial Services Firm
26%
Computer Software Company
13%
Educational Organization
7%
Government
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
 

Questions from the Community

What do you like most about Apache Spark?
We use Spark to process data from different data sources.
What is your experience regarding pricing and costs for Apache Spark?
Compared to other solutions like Doc DB, Spark is more costly due to the need for extensive infrastructure. It requires significant investment in infrastructure, which can be expensive. While cloud...
What needs improvement with Apache Spark?
The Spark solution could improve in scheduling tasks and managing dependencies. Spark alone cannot handle sequential tasks, requiring environments like Airflow scheduler or scripts. For instance, o...
Seeking lightweight open source BI software
There are many...It would rather depend what System BI architecture or Enterprise legacy you have at your end...I would recommend as follows: 1) If you have legacies of SAP, Oracle - look for SAP...
What is your experience regarding pricing and costs for Pentaho Business Analytics?
For those starting to use this tool, there is a free version available which is beneficial. The company also finds the pricing to be good.
What needs improvement with Pentaho Business Analytics?
The tool is very good, and yet it has some problems as it relies heavily on Java. The platform works with Java, and it has brought some issues to the company.
 

Also Known As

No data available
Pentaho, Kettle, Hitachi Pentaho Business Analytics
 

Overview

 

Sample Customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
Cargo 2000 Lufthansa, Marketo, ModCloth, Cardiac Science, Telefonica, ExactTarget, Active Broadband Networks, and Brussels Airport.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: March 2025.
845,406 professionals have used our research since 2012.