Try our new research platform with insights from 80,000+ expert users
PeerSpot user
Pentaho Specialist/Free Software Expert at a tech services company with 10,001+ employees
Consultant
One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities.

What is most valuable?

Pentaho is a suite with five main products: Pentaho Data Integration for ETL, Pentaho Business Analytics Server for results delivery and development clients Report Designer, Metadata Editor and Schema Workbench.

Pentaho Data Integration's (PDI, former Kettle) features and resources are virtually unbeatable as it can handle everything from the smallest Excel files to the most complex and demanding data loads. It's able to scale from a single desktop computer to lots of nodes, on premises or in the cloud. Not only is it powerful, but it is also easy to use. I have never worked with anything else, like Informatica's PowerCenter or Microsoft's SSIS but I have always taken the opportunity to inquire who has. Lastly, PDI is easier to use and achieves more with less effort than those other products.

Then there is the Pentaho BA Server, built to be the linchpin on BI delivery for enterprises. It is built on a scalable, auditable platform able to deliver from dashboards and reports to OLAP and custom-made features. It supports background processing, results bursting by e-mail, load balacing (through native Java Webserver - like Tomcat - load balancing features), integration with corporate directories services as MS Active Directory and LDAP directories, with account management and lots of bell and whistles.

The suite's plugin architecture deserves a special remark: Both PDI and BA Server are built to be easily extended with plugins. There are two plugins marketplaces, one for PDI and onde for BA Server, with a good supply of diverse features. It all those plugins are not enough, there are means to develop you own plugin either coding in Java (mostly for PDI) or, for the BA Server, with point-and-click ease with Sparkl, a BA Server plugin for easy development and packing of new BA Server plugins (but some need of JavaScript, CSS and HTML is needed.)

Any company is able to design and delivery a deep and embrancing BI strategy with Pentaho. At its relatively low prices, when sided with comparable competition, the most valuable features are the data integration and the results delivery platform.

How has it helped my organization?

I work for the largest government owned IT enterprise in Brazil, employing over 10.000 people with yearly earning in surplus of half billion dollars. Designing and delivering timely BI solutions used to be a bogged down process because everything involved license costs. With Pentaho we were able to better suit our needs and better serve our customers. We use CE were for our departamental BI needs, and deliver solid service to our customers using paid licenses. Also, in being so complete, Pentaho has enabled a whole new level of experimentation and testing. We can completlly evaluate a customer need with CE licenses and then delivery the solution at a price, assembling it over EE licenses. We need paid support for our customers in order to be able to timely answer any outage.

What needs improvement?

Pentaho has a solid foundation and decent user interfaces. They are lacking, however, in the tool space for data exploration/presentation. The recent Data Discovery trend put a lot of strain on visual data analysis tools suppliers and Pentaho has chosen to strengthen their data integration features, aiming for Big Data and Hadoop growing market. The work on visual data exploring tools was then mainly left for the community to tackle on.

So, there is room for improvement regarding graphical interface for data exploration and presentation. Please note that there is no wanting for decent tools, only that the tools are not as sharp and as beautiful as QlikView, for instance. Pentaho delivers, no question, it only does not pleases the eye that much.

For how long have I used the solution?

I have been using the whole Pentaho suite for nine years. I have also self-published a book on Pentaho and regularly write for my BI/Pentaho blog.

Buyer's Guide
Pentaho Business Analytics
May 2025
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
851,823 professionals have used our research since 2012.

What was my experience with deployment of the solution?

Being such a young product, experiencing fast evolution and rapid company growth, not every time things are bug free. Every new release cames in with its share of new bugs. Any upgrades were not without concerns, although there were never risk of losing data - Pentaho is simple to an extreme and hardly we find some nasty dependency hurting our deliveries.

The main deploy problems were with LDAP and Apache integration. There is a need for quite some knowledge on web servers architecture to allow a team a smooth delivery experience.

What do I think about the stability of the solution?

We did encounter stability issues. Being a data intensive application, Pentaho is quite sensitive to RAM limitations. Whenever not enough RAM is allocated for it to work, it would progressively slow down to a crawl and then to a halt. Lots of well managed disk cache and server clustering aleviates it though.

What do I think about the scalability of the solution?

Pentaho scales really very well.

Pentaho Data Integration scalation is a breeze: just setup the machines, configure the slaves and master and that is it. One has only to enable the jobs and transformation to take advantage of PDI's clustering abilities, and that might be tricky but easy nonetheless. Bottom line of data integration scalability is limited to developers ingenuity on data processing compartmentalization so processing parallelization and remote processing become profitable for clustering.

Pentaho BA Server also scales well, on a quite standard load balancing scheme. Being a regular and well behaved Java program, the Pentaho BA Server is enabled to be clustered on the Java web server, like JBoss, or in a Apache/Tomcats multi-server loading balancing schema.

It is not for the amateur Pentaho administration to do it, however. In fact, a Pentaho administrator alone probably will have a degree of difficulty to achieve server scaling, and would be better of having help from web server clustering professionals.

How are customer service and support?

Customer Service:

My company has been served only be the Brazilian Pentaho's representative, which are knockout good guys and gals, which deliver it at any cost! They have even brought in Pentaho technicians from USA to assess some of our issues. Only kudos to them. I cannot opine on US or Europe support, but I have no reason to think less of them.

Technical Support:

Technical support is a mixed issue with Pentaho. As previously stated, it is a young product, from a young company. The technical support by the means of instructions manuals, fora, Wikis and the like is quite good. However, the fast growing has left some breaches along the documentation body.

For instance, I needed to find how to enable certain feature on reporting designing. I was not able to find it in the official help guides, but there was the project leader blog where I found a post talking about it. With the correct terming I was able to look for it in the International Forum, where lying there was the answer I was in need of. So, overall it is good, but it is still in the road for a complete and centralized, well managed, gapless documentation body.

Which solution did I use previously and why did I switch?

In fact we are still using the whole lot: MicroStrategy, Business Objects, and PowerCenter. We have not turned off all those implementations, only Pentaho clang all around us like weed - it is so easy to start using and gives results with so little effort it is almost impossible to use something else. Most of the time, we offer other options only at the customers requesting. Otherwise, left to us, we are most likely to propose using Pentaho.

How was the initial setup?

Hard answer: both. We got up to delivering results in almost no time. However, a sizeable lot of little vicious details kept resisting to us - most issues with stability, latter associated with RAM limitations, and user management, tied to LDAP integration. Part of the said difficulties stemed from bugs, too, so there were only a matter of time waiting for Pentaho to fix them,

After that the customer kicked in a lot of small changes and adaptations, truly to the "since-we-are-at-it"-scope-creep-spirit (some rightful, some pure fancy), which had us and Pentaho scratching our mutual heads. In the end we kinda helped them advance some updates in the Server. And delivered all that was asked.

What about the implementation team?

We started with our in house team and when things started to get too much weird or complicated the vendor team landed in. After that first fire baptism we got a couple of hard boiled ninjas that were able to firefight anything and the vendor team was sent back home, with praises.

What was our ROI?

No ROI for us. The company I work for has no business approach to BI strategy. All we, as a company care, is to make the customer happy and that has the cost of not letting us turn down some unprofiting projects. So, Pentaho is a good tool and capable of delivering millions of dollars on new/recouped/saved revenue, but we are not posing for that.

Thinking a bit more, the mere fact we are able to deliver more, and hence take more orders, might be seem as a return on our investment. Yet I can't exact a number, for even this kind of return is a little unclear.

What's my experience with pricing, setup cost, and licensing?

Pentaho is cheap, and becomes cheaper as your team master it. However, it would be a total waste of good dollars to believe my word. Try it for free and go look for professional support from Pentaho. You can also try to compare other tools with Pentaho, but keep in mind that, appart from SAS, all other tools compete on a part of Pentaho. So you must assembly a set of different products to fully compare to it.

Let us say you are going to build a standard dimensional data mart to serve OLAP up. Pentaho has a single price tag, which must be matched to a MicroStrategy PLUS Informatica PowerCenter to make for a correct comparison.

The Community Edition, a free version, is not short on features when compared to the Enterprise Edition, it is just a bit uglier.

To match a Pentaho license price with only either one will give wrong results.

Which other solutions did I evaluate?

Pentaho was a total unknown product back in 2006-2007. We ran several feature comparison sheets. The biggest and most controversial were against Informatica's PowerCenter and MicroStrategy Intelligent Server. Both were matched with Pentaho at some degree, and few things Pentaho was not able to deliver then. But, and this is a rather strong but, most of the time Pentaho had to be tweaked with to deliver that itens. It was a match, allright, but not a finished product by then.

Since that time the suite has evolved a lot and became more head to head comparable with the same products.

What other advice do I have?

Pentaho has a huge potential to deliver quite a lot of BI value. But on those days when BI is regarded as a simple multidimensional analytics tools, it seems a bit bloated and off the mark. It is so because Pentaho is not aimed to be flashy and eye-pleasing for a commomplace reporting monger (reporting is the farthest you can get from BI and still smell like it), and it requires a bit of strategy to allow for ROI. If you are looking for tools for immediate, prompt, beautifull remmedy, Pentaho might not be your pick. But if you know what you want to acomplish, go on and try it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Manager at a consultancy with 10,001+ employees
MSP
​It helped in managing the data from different sources into one unique target.​ I would like to see what code the report tool generates.

What is most valuable?

  • Pentaho data integration
  • Most of the ETL stuff can be done with minimal coding
  • Reporting capabilities

How has it helped my organization?

It helped in managing the data from different sources into one unique target.

What needs improvement?

In the reporting tool, I would like to see what code it generates. As of now, there is no provision to see the underlying code of the PRD file.

For how long have I used the solution?

I've used it for one year.

What was my experience with deployment of the solution?

There have been no issues with deployment.

What do I think about the stability of the solution?

There have been no stability issues.

What do I think about the scalability of the solution?

There have been no issues scaling it.

How are customer service and technical support?

Customer Service:

I have not had to use the customer service.

Technical Support:

I have not had to use technical support.

Which solution did I use previously and why did I switch?

There was no other solution in place.

How was the initial setup?

It was straightforward and became complex later to our understanding of the existing structure and the use of the ETL to align with those.

What about the implementation team?

We did it in-house. You need to have a good understanding of what the tool can offer like ETL, MDM, ans SCDs.

What's my experience with pricing, setup cost, and licensing?

We're using the free edition.

What other advice do I have?

It's moderate to use, learn, and implement. It's nice and you should use it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Pentaho Business Analytics
May 2025
Learn what your peers think about Pentaho Business Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: May 2025.
851,823 professionals have used our research since 2012.
PeerSpot user
IT Manager at a transportation company with 51-200 employees
Vendor
In terms of functionality, they're not growing as fast as other companies. It's good for showing the need for BI.

What is most valuable?

Pentaho Data Integration (PDI).

Pentaho Analysis Services

Pentaho Reporting

How has it helped my organization?

We developed Sales’s and HR's datamarts. So nowadays, managers of these departments can have quick and flexible response with them. I think it was an improvement, because in the past each new analyses demanded IT resources, taking time, and this doesn't occur nowadays. The final users have much more freedom to discover the information they need.

What needs improvement?

I think that Pentaho can improve a lot its UI interface and its tool for dashboard maintenance.

For how long have I used the solution?

2 years

What was my experience with deployment of the solution?

I think the most complex are the solutions with the most hardcore implementations. Pentaho could invest more to make the life of developers’ easier.

What do I think about the stability of the solution?

Yes, once in a while, we have to face a unexpected problem that takes us time to overcome. And it causes problems with user’s satisfaction.

What do I think about the scalability of the solution?

No. I think the choice for Pentaho was right for my company. It fits very well for our purpose, which was demonstrate to the directors the power of BI for the business. But, now there is a perception of the benefits, and the company is become bigger. Perhaps, in the near future, I can evaluate other options, even Pentaho EE.

How are customer service and technical support?

Customer Service:

My company has a procedure to evaluate all of our suppliers and we have questions about promptness, level of expertise, pre-sale and post-sale, effectiveness and efficiency.

Technical Support:

7 out of 10

Which solution did I use previously and why did I switch?

Yes, when I started with Pentaho in 20111 I already had worked in another company that had Cognos BI Suite as a BI solution.

How was the initial setup?

The initial setup was straightforward. The setup was done by my team, which had no expertise with the Pentaho BI Suite. In 2 days, I was presented with the first dashboards.

What about the implementation team?

I implemented my first Pentaho project with a vendor team, which help us a lot, but its level of expertise could be better. In the middle of the project, we had some delays related to doubts which had to be clarified by Pentaho’s professionals.

What was our ROI?

The ROI of this product is good, because in little time you can have the first’s outputs. But it’s not excellent if compared with other BI solutions, like QlikView or Tableau.

What's my experience with pricing, setup cost, and licensing?

My original setup cost for the first project was $30,000 and the final cost was about $35,000.

Which other solutions did I evaluate?

Yes. Cognos, Microstrategy and Jaspersoft.

What other advice do I have?

For me, Pentaho is not growing in terms of functionality, as fast as other companies in the same segment. The UI falls short and for more complex solutions, it’s necessary to have good developers. However, being an Open Source solution, I think it allows IT departments to show with low investment the importance of BI for the company.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
reviewer2031747 - PeerSpot reviewer
Administrative Assistant at a university with 10,001+ employees
Real User
Makes it easy to develop data flows and has a wide range of database connections
Pros and Cons
  • "Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
  • "Pentaho Business Analytics' user interface is outdated."

What is our primary use case?

I primarily use Pentaho Business Analytics to create ETL processes, monitoring processes, and hierarchies.

What is most valuable?

Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud.

What needs improvement?

Pentaho Business Analytics' user interface is outdated. It's also limited in out-of-the-box features, which forces you to develop features yourself. There are also some problems with having to update metadata manually, which I would like to see Pentaho fix in the future. 

What do I think about the stability of the solution?

Pentaho Business Analytics is stable.

What do I think about the scalability of the solution?

Pentaho Business Analytics is scalable (though I have only tested this lightly).

How are customer service and support?

Since Pentaho Business Analytics is open-source, it has a very helpful community.

Which solution did I use previously and why did I switch?

I previously used Microsoft Integration Services and Microsoft Azure Data Factory.

How was the initial setup?

The initial setup was easy.

What other advice do I have?

Pentaho Business Analytics is a very good product for those starting to work with ETL processes. Usually, it will solve every problem you may have when creating those processes, and it's free, with a big support community. However, it may not be the best choice if your company has a very strong relationship with Microsoft or if you want to work in the cloud. I would give Pentaho Business Analytics a rating of eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Senior Consultant at a consumer goods company with 1,001-5,000 employees
Consultant
The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity.

Valuable Features

Pentaho Business Analytics platform overall is an outstanding product that offers great cost saving solutions for companies of all sizes. The Pentaho Business Analytics platform is built on top of several underlying open source projects driven by the community’s contributions. There are several features that I find invaluable and with each release, improvements are made.

The Pentaho User Console provides a portal for users that makes it easy for users to explore information interactively. Dashboard Reporting, scheduling jobs, and managing data connections are some of the features that are made easy with the console. For more advanced users you can extend Pentaho Analyzer with custom visualizations or create reporting solutions with Ctools. The Marketplace empowers the community to develop new and innovative plugins and simplifies the installation process of the plugins for the users of the console. The plugin framework provides a plugin contributor that extends the core services offered by the BI Server.

Pentaho Data Integration (Spoon) is also another valuable tool for development. Spoon delivers powerful extraction, transformation, and load capabilities using a Metadata approach. The Data Integration graphical drag and drop design is easy for new users to follow and can increase productivity. More advanced users can extend Pentaho Data Integration creating transformations and jobs dynamically.

Improvements to My Organization

My company was able to reduce software costs and hire additional staff given the cost savings that Pentaho provided. We are moving towards a Hadoop environment after the migration of our current ETL processes and Pentaho’s easy to use development tools and big data analytics capabilities were a factor in choosing Pentaho as a solution.

Room for Improvement

For those that run the open source community edition at times it can be difficult to find updated references for support. Even for companies that use the Enterprise Edition finding useful resources when a problem occurs can be difficult. Pentaho driven best practices should be made available to both the Community and Enterprise users to motivate and empower more users to use the solutions effectively.

Customer Service and Technical Support

Pentaho has stellar support services with extremely intelligent Pentaho and Hitachi consultants all over the world. Those support services and documentation are made available to Enterprise clients that have purchased the Enterprise Edition and have access to the support portal.

Initial Setup

Pentaho is easy to deploy, easy to use and maintain. It’s low cost and a fully supported business intelligence solution. I have used Pentaho in small and large organizations with great success.

Pricing, Setup Cost and Licensing

Enterprise licenses can be paid for the Enterprise Pentaho full service solution which offers support through the portal and access to Pentaho/Hitachi Consultants for additional costs.

Other Advice

Pentaho offers a community edition which is an open source solution and can be downloaded for free. The community edition truly gives most companies everything they need but your solution needs are matched with your business needs. As a cost cutting option Enterprise license fees can be paid to vendors to fund in demand support.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Final Thoughts – Part 6 of 6

Introduction

This is the last of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

Data Mining

In this sixth part, originally, I'd like to at least touch on the only part of Pentaho BI Suite we have not talked about before: Data Mining. However as I gather my materials, I realized that Data Mining (along with its ilks: Machine Learning, Predictive Analysis, etc.) is too big of a topic to fit in the space that we have here. Even if I try, the usefulness would be limited at best since at the moment, while the result is being used to solve real-world problems, the usage of Data Mining tools is still exclusively within the realm of data scientists.

In addition, as of late I use Python more for working with datasets that requires a lot of munging, preparing, and cleaning. So as an extension to that, I ended using Pandas, SciKit Learning, and other Python-specific Data Mining libraries instead of Weka (which is basically what the Pentaho Data Mining tool is).

So for those who are new to Data Mining with Pentaho, here is a good place to start, an interview with Mark Hall who was one of the author of Weka who now works for Pentaho: https://www.floss4science.com/machine-learning-with-weka-mark-hall

The link above also has some links to where to find more information.

For those who are experienced data scientists, you probably already made up your mind on which tool suits your needs best and just like I went with Python libraries, you may or may not prefer the GUI approach like Weka.

New Release: Pentaho 5.0 CE

For the rest of this review, we will go over the new changes that comes with the highly anticipated release of the 5.0 CE version. Overall, there are a lot of improvements in various parts of the suite such as PDI and PRD, but we will focus on the BI Server itself, where the largest impact of the new release can be seen.

A New Repository System

In this new release, one of the biggest shock for existing users is the switch from file-based repository system to the new JCR-based one. JCR is a database-backed content repository system that was implemented by the Apache Foundation and code-named “Jackrabbit.”

The Good:

  • Better metadata management
  • No longer need to refresh the repository manually after publishing solutions
  • A much better UI for dealing with the solutions
  • API to access the solutions via the repository which opens up a lot of opportunities for custom applications

The Bad:

  • It's not as familiar or convenient as the old file-based system
  • Need to use a synchronizer plugin to version-control the solutions'

It remains to be seen if this switch will pay off for both the developers and the users in the long run. But it is stable and working for the most part, so I can't complain.

The Marketplace

One of the best feature of the Pentaho BI Server is its plugin-friendly architecture. In version 5.0 this architecture has been given a new face called the Marketplace:

This new interface serves two important functions:

  1. It allows admins to install and update plugins (almost all Pentaho CE tools are written as plugins) effortlessly
  2. It allows developers to publish their own plugins to the world

There are already several new plugins that is available with this new release, notably Pivot4J Analytics. An alternative to Saiku that shows a lot of promises to become a very useful tool to work with OLAP data. Another one that excites me is Sparkl with which you can create other custom plugins.

The Administration Console

The new version also brings about a new Administration Console where we manage Users and Roles:

No longer do we have to fire-off another server just to do this basic administrator task. In addition, you can manage the Mail server (no more wrangling configuration files).

The New Dashboard Editor

As we discussed in Part V of this review, the CDE is a very powerful dashboard editor. In version 5.0, the list of available Components are further lengthen by new ones. And the overall editor seems to be more responsive in this new release.

Usage experience: The improvements in the Dashboard editor is helping me to create dashboards for my clients that goes beyond the static ones. In fact, the one below (demo purposes only) has the interactivity level that rivals a web application or an electronic form:

NOTE: Nikon and Olympus are trademarks of Nikon Corporation and Olympus Group respectively.

Parting Thoughts

Even though the final product of a Data Warehouse of a BI system is a set of answers and forecasts, or dashboards and reports, it is easy to forget that without the tools that help us to consolidate, clean up, aggregate, and analyze the data, we will never get to the results we are aiming for.

As you can probably tell, I serve my clients with various tools that makes sense given their situation, but time and again, the Pentaho BI Suite (CE version especially) has risen to fulfill the needs. I have created Data Warehouses from scratch using Pentaho BI CE, pulling in data from various sources using the PDI, created OLAP cubes with the PSW, which ends up as the data source for the various dashboards (financial dashboards, inventory dashboards, marketing dashboards, etc.) and published reports created using the PRD.

Of course my familiarity with the tool helps, but I am also familiar with a lot of other BI tools beside Pentaho. And sometimes I do have to use other tools in preference to Pentaho because they suit the needs better.

But as I always mention to my clients, unless you have a good relationship with the vendor to avoid paying hundreds-of-thousands per year just to be able to use tools like IBM Cognos, Oracle BI, or SAP Business Objects, there is a good chance that the Pentaho (either EE or CE version) can do the same for less, even zero license cost in the case of CE.

Given the increased awareness on the value of data analysis in today's companies, these BI tools will continue to become more and more sophisticated and powerful. It is up to us business owners, consultants, and data analysis everywhere to develop the skills to harness the tool and crank out useful, accurate, and yes, easy-on-the-eyes decision-support systems. And I suspect that we will always see Pentaho as one of the viable options. A testament to the quality of the team working on it. The CE team in particular, it would be amiss not to acknowledge their efforts to improve and maintain a tool this complex using the Open Source paradigm.

So here we are, at the end of the sixth part. Writing this six-part review has been a blast. And I would like to give a shout out to the IT Central Station who has graciously hosted this review for all to benefit from. Thanks for reading.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Dashboards – Part 5 of 6

Introduction

This is the fifth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fifth part, we'll be discussing how to create useful and meaningful dashboards using the tools available to us in the Pentaho BI Suite. As a complete Data Warehouse building tool, Pentaho offers the most important aspect for delivering enterprise-class dashboards, namely Access Control List (ACL). A dashboard-creation tool without this ability to limit dashboards access to a particular group or role within the company is missing a crucial feature, something that we cannot recommend to our clients.

On the Enterprise Edition (EE) version 5.0, dashboard creation has a user-friendly UI that is as simple as drag-and-drop. It looks like this:

Figure 1. The EE version of the Dashboard Designer (CDE in the CE version)

Here the user is guided to choose a type of grid layout that is already prepared by Pentaho. Of course the option to customize the looks and change individual components are available under the hood, but it is clear that this UI is aimed towards end-users looking for quick results. More experienced dashboard designers would feel severely restricted by this.

In the rest of this review, we will go over dashboard creation using the Community Edition (CE) version 4.5. Here we are going to see a more flexible UI which unfortunately also demands familiarity with javacript and chart library customizations to create something more than just basic dashboards.

BI Server Revisited

In the Pentaho BI Suite, dashboards are setup in these two places:

  1. Using special ETLs we prepare the data to be displayed on the dashboards according to the frequency of update that is required by the user. For example, for daily sales figures, the ETL would be scheduled to run every night. Why do we do this? Because the benefits are two-fold: It increase the performance of the dashboards because it is working with pre-calculated data, and it allows us to apply dashboard-level business rules.
  2. The BI Server is where we design, edit, assign access permissions to dashboards. Deep URLs could be obtained for a particular dashboard to be displayed on a separate website, but some care has to be taken to go through the Pentaho user authorization; depending on the web server setup, it could be as simple as passing authorization tokens, or as complex as registering and configuring a custom module.

Next, we will discuss each of these steps in creating a dashboard. As usual, the screenshots below are sanitized and there are no real data being represented. Data from a fictitious microbrewery is used to illustrate and relate the concepts.

Ready, Set, Dash!

The first step is to initiate the creation of a dashboard. This is accomplished by selecting File > New > CDE Dashboard. A little background note, CDE (which stands for Ctools Dashboard Editor) is part of the Community Tools (or Ctools) created by the team who maintains and improve Pentaho CE.

After initiating the creation of a new dashboard, this is what we will see:

Figure 2. The Layout screen where we perform the layout step

The first thing to do is to save the newly created (empty) dashboard into somewhere within the Pentaho solution folder (just like what we did when we save an Analytic or Ad-Hoc Reports). To save the currently worked on dashboard, use the familiar New | Save | Save As | Reload | Settings menu. We will not go into details on each of this self-explanatory menus.

Now look at the top-right section. There are three buttons that will toggle the screen mode, this particular one is in the Layout mode.

In this mode, we take care of the layout of the dashboard. On the left panel, we see the Layout Structure. It is basically a grid that is made out of Row entries, which contains Column(s) which itself may contain another set of Row(s). The big difference between Row and Column is that the Column actually contains the Components such as charts, tables, and many other types. We give a name to a Column to tie it to a content. Because of this, the names of the Columns must be unique within a dashboard.

The panel to the right, is a list of properties that we can set the values of, mostly HTML and CSS attributes that tells the browser how to render the layout. It is recommended to create a company-wide CSS to show the company logo, colors, and other visual markings on the dashboard.

So basically all we are doing in this Layout mode is determining where certain contents should appear within the dashboard, and we do that by naming each of the place where we want those contents to be displayed.

NOTE: Even though the contents are placed within a Column, it is a good practice to name the Rows clearly to indicate the sections of the dashboard, so we can go back later and be able to locate the dashboard elements quickly.

Lining-Up Components

After we defined the layout of the dashboard using the Layout mode, we move on to the next step by clicking on the Components button on the top horizontal menu as shown in the screenshot below:

Figure 3. The Components mode where we define the dashboard components

Usage experience: Although more complex, the CDE is well implemented and quite robust. During our usage to build dashboards for our clients, we have never seen it produce inconsistent results.

In this Components mode, there are three sections (going from left to right). The left-most panel contains the selection of components (data presentation unit). Ranging from simple table, to the complex charting options (based on Protovis data visualization library), we can choose how to present the data on the dashboard.

The next section to the right contains the current components already chosen for the dashboard we are building. As we select each of these components, its properties are displayed in the section next to it. The Properties section is where we fill-in the information such as:

  • Where the data is coming from
  • Where the Component will be displayed in the dashboard. This is done by referring to the previously defined Column from the Layout screen
  • Customization such as table column width, the colors of a pie chart, custom scripting that should be run before or after the component is drawn

This clean separation between the Layout and the Components makes it easy for us to create dashboards that are easy to maintain and accommodates different versions of the components.

Where The Data Is Sourced

The last mode is the Data Source mode where we define where the dashboard Components will get their data:

Figure 4. The Data Sources mode where we define where the data is coming from

As seen in the left-most panel, the data source type is quite comprehensive. We typically use either SQL or MDX queries to fetch the data set in the format that is suitable to be presented in the Components we defined earlier.

For instance, a data set to be presented in a five-columns table will look different than one that will be presented in a Pie Chart.

This screen follows the other in terms of sections, we have (from left to right) the Data Source type list, the currently defined data sources, and the Properties section on the right.

Usage experience: There may be some confusion for those who are not familiar with the way Pentaho define a data source. There are two “data source” concepts represented here. One is the Data Source defined in this step for the dashboard, and the other, the “data source” or “data model” where the Data Source connects to and run the query against.

After we define the Data Sources and name them, we go back to the Components mode and specify these names as the value of the Data source property of the defined components.

Voila! A Dashboard

By the time we finished defining the Data Sources, Components, and Layout, we end up with a dashboard. Ours looks like this:

Figure 5. The resulting dashboard

The Title of the dashboard and the date range is contained within one Row. So are the first table and the pie chart. This demonstrates the flexibility of the grid system used in the Layout mode.

The company color and fonts used in this dashboard is controlled via the custom CSS specified as Resource in the Layout mode.

All that is left to do at this point is to give the dashboard some role-based permissions so access to it will be limited to those who are in the specified role.

TIP: Never assign permission at the individual user level. Why? Think about what has to happen when the person change position and is replaced by someone else.

Extreme Customization

Anything from table column width to the rotation-degrees of the x-axis labels can be customized via the properties. Furthermore, for those who are well-versed in Javascript language, there are tons of things that we can do to make the dashboard more than just a static display.

These customizations can actually be useful other than just making things sparkle and easier to read. For example, by using some scripting, we can apply some dashboard-level business rules to the dashboard.

Usage experience:Let's say we wanted to trigger some numbers displayed to be in the red when it fell below a certain threshold, we do this using the post-execution property of the component and the script looks like this:

Figure 6. A sample post-execution script

Summary

The CDE is a good tool for building dashboards, coupled with the ACL feature built into the Pentaho BI Server, they serve as a good platform for planning and delivering your dashboard solutions. Are there other tools out there that can do the same thing with the same degree of flexibility? Sure. But for the cost of only time spent on learning (which can be shortened significantly by hiring a competent BI consultant), it is quite hard to beat free licensing cost.

To squeeze out its potentials, CDE requires a lot of familiarity with programming concepts such as formatting masks, javascript scripting, pre- and post- events, and most of the times, the answer to how-to questions can only be found in random conversations between Pentaho CE developers. So please be duly warned.

But if we can get past those hurdles, it can bring about some of the most useful and clear dashboards. Notice we didn't mention “pretty” (as in “gimicky”) because that is not what makes a dashboard really useful for CEOs and Business Owners in day-to-day decision-making.

Next in the final part (part-six), we will wrap up the review with a peek into the Weka Data Mining facility in Pentaho, and some closing thoughts.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Owner with 51-200 employees
Vendor
Pentaho BI Suite Review: Pentaho Analytics – Part 4 of 6
Introduction

This is the fourth of a six-part review of the Pentaho BI suite. In each part of the review, we will take a look at the components that make up the BI suite, according to how they would be used in the real world.

In this fourth part, we'll be discussing the Pentaho Analytics tools and facilities, which provides the ability to view, “slice and dice” data from multiple dimensions. This particular feature is the most associated with the word “Business Intelligence” due to its usefulness to aid cross-data-domain decision-making processes. Any decent BI suites have at least one facility with which users can perform data analysis with.

One important note, specifically for Pentaho, the Analytics toolset is where the real advantage of the Enterprise Edition (EE) over Community Edition (CE) starts to show-through – other than the much more polished UI.

In the Pentaho BI Suite, we have these analytics tools:

  1. Saiku Analytics (In EE this is called “Analysis Report”) – A tool built into Pentaho User Console (PUC) that utilizes the available analysis models. Do not confuse this with the Saiku Reporting.
  2. Pentaho Model Data Source – In part three of the review, we discussed this facility to create data models for Ad-hoc reporting. The second usage of this facility is to create an OLAP “cube” for use with the Saiku Analytics tool. Once this is setup by the data personnel, data owners can use it to generate analytic reports.
  3. Schema Workbench – A separate program that allows for handcrafting OLAP cube schemas. Proviciency with MDX query language is not necessary but can come in handy in certain situations.

As usual, we'll discuss each of these components individually. The screenshots below are sanitized and there are no real data being represented. A fictitious company called “DonutWorld” is used to illustrate and relate the concepts.

Saiku Analytics (Analysis Report in EE)

One of the benefit of having a Data Warehouse is to be able to model existing data in a structure that is conducive to analysis. If we try to feed tools such as this with a heavily normalized transaction database, we are inviting two problems:

1. We will be forced to do complex joins which will manifest itself in performance hit and difficulty when business rules change

2. We lose the ability to apply non-transactional business rules to the data which is closer to the rule maintainers (typically those who work closely with the business decision-makers)

Therefore to use this tool effectively we need to be thinking in terms of what questions need to be answered, then work our way backwards employing data personnels to create the suitable model for the said questions. Coincidentally, this process of modeling data suitable for reporting is a big part of building a Data Warehouse.

Learning experience: Those who are familiar with MS Excel (or Libre Office) Pivot Tables will be at home with this tool. Basically, as the model allows, we can design the view or report by assigning dimensions into columns and rows, and then assigning measures to define what kind of numbers we are expecting to see. We will discuss below what 'dimension' and 'measure' mean in this context, but for an in-depth treatment, we recommend consulting your data personnels.

Usage experience: The EE version of this tool has a clearer interface as far as where to drop dimensions and measures, but the CE version is usable once we are accustomed to how it works. Another point for the EE version (version 5.0) is the ability to generate total sums in both row and column direction and a much more usable Excel export.

Figure 1. The EE version of the Analysis Report (Saiku Analytics in CE)

Pentaho Model Data Source

The Data Source facility is accessible from within the PUC. As described in Part 3 of this review, once you have logged in, look for a section on the screen that allows you to create or manage existing data sources.

Here we are focusing on using this feature to setup “cubes” instead of “models.” This is something that your data personnels should be familiar with, guided by the business questions that needs answering.

Unlike the “model”, the “cubes” are not flat, rather it consists of multiple dimensions that determines how the measures are aggregated. Out of these “cubes” non-technical users can create reports by designing it just like they would Pivot Tables. The most useful aspect of this tool is to abstract a construction of an OLAP cube schema to its most core concepts. For example, given a fact table, this tool will try to generate an OLAP cube schema. And in most part, it's doing a good job in the sense that the cube is immediately usable to generate Analysis Reports.

This tool also hide the distinction between Hierarchies and Levels of dimensions. For the most part, you can do a lot with just one Level anyway, so this is easier to grasp for beginners in OLAP schema design.

Learning experience: The data personnel must be 1) familiar with the BI table structures or at the very least can pinpoint which of the tables are facts and dimensions; 2) comfortable with designing OLAP dimensions and measures. Data owners must be familiar with the structure and usage of the data. The combined efforts by these two roles are the building blocks of a workflow/process.

Usage experience: Utilizing the workflow/process defined above, an organization will generate a collection of OLAP cubes that can be used to analyze the business data with increasing accuracy and usefulness. The most important consideration from the business standpoint, is that all of this will take some time to materialize. The incorrect attitude here would be to expect instant results, which will not transpire unless the dataset is overly simplistic.

Figure 2. Creating a model out of a SQL query

NOTE: Again, this is where the maturity level of the Data Warehouse is tested. For example, a DW with sufficient maturity will notify the data personnel of any data model changes which will trigger the updating of the OLAP cube, which may or may not have an effect on the created reports and dashboards.

If the DW is designed correctly, there should be quite a few fact tables that can readily be used in the OLAP cube.

Schema Workbench

The Schema Workbench is for those who needs to create a custom OLAP schema that cannot be generated via the Data Source facility in the PUC. Usually this involves complicated measure definitions, multi-Hierarchy or multi-Level dimensions, or to evaluate and optimize MDX queries.

NOTE: In the 5.0 version of PUC, we can import existing MDX queries into the Data Source Model making it available for the Analysis Report (or Saiku Ad-Hoc report in the CE version). As can be seen in the screenshot below, the program is quite complex with the numerous features to handcraft an OLAP cube schema.

Once a schema is validated in the Workbench, we need to publish it. Using the password defined in the pentaho-solutions/system/publisher_config.xml, the Workbench will prompt for the location of the cube within the BI Server and the displayed name. From that point, it will be available to choose from the drop-down list on the top left of the Saiku Analytics tool.

Figure 3. A Saiku report in progress

OLAP Cube Schema Considerations

Start by defining the fact table (bi_convection in the above example), then start defining dimensions and measures.

We have been talking about these concepts of dimension and measure. Let's briefly define them:

  1. A dimension is a way to view existing business data. For instance, a single figure such as sales number can be viewed from the perspectives. We can view it per sales regions, per salesperson or department, or chronologically. Using aggregation function such as sum, average, min/max, standard deviation, etc. we can come up with different numbers that shows the data in a manner that we can draw conclusion from.
  2. A measure is the numbers or counts of business data that can provide an indication on how the business is doing. For a shoe manufacturing company, obviously the number of shoes sold is one very important measure, another would be the average price of sold shoes. Combined with dimensions, we can use the measures to make a business decision.

In the Schema Workbench, as you select the existing BI table fields into the proper dimensions, it will validate the accessibility of the fields using the existing database connection, then create a view of the measures using a certain user-configurable way to aggregate the numbers.

In the creation of an OLAP cube schema, there is a special dimension that enables us to see data chronologically. Due to its universal nature, this dimension is a good one to start with. The time dimension is typically served by a special BI table that contains a flat list of rows containing time and date information within the needed granularity (some businesses requires seconds, others days, or even weeks or months).

TIP: Measures can be defined using “case when” SQL construct, which opens a whole other level of flexibility.

When should we use MDX vs SQL?

The MDX query language, with its powerful concepts like ParallelPeriods, is suitable for generating tabular data containing aggregated data that is useful for comparison purposes.

True to its intended purposes, MDX queries allows for querying data which is presented in a multi-dimensional fashion. While SQL is easier to grasp and has a wider base of users/experts in any industry.

In reality, we use these two languages at different levels, the key is to be comfortable with both, and discover the cases where one would make more sense than the other.

NOTE: The powerful Mondrian engine is capable, but without a judicious use of database indexing, query performance can crawl into minutes instead of seconds easily. This is where data personnels with database tuning experiences would be extremely helpful.

Summary

The analytics tools in the Pentaho BI Suite is quite comprehensive. Certainly better than some of the competing tools out there. The analytic reports are made available on the Pentaho User Console (PUC) where users login and initiate the report generation. There are three facilities available:

The Analysis Report (or Saiku Analytics in CE version) is a good tool for building reports that look into an existing OLAP cube and do the “slicing and dicing” of data.

The Data Source facility can also be used to create OLAP cubes from existing BI tables in the DW. A good use of this facility is to build a collection of OLAP cubes to answer business questions.

The Schema Workbench is a standalone tool which allows for handcrafting custom OLAP cube schemas. This tool is handy for complicated measure definitions and multilevel dimensions. It is also a good MDX query builder and evaluator.

Next in part-five, we will discuss the Pentaho Dashboard design tools.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2025
Buyer's Guide
Download our free Pentaho Business Analytics Report and get advice and tips from experienced pros sharing their opinions.