Friday, July 27, 2012

Olympic Analysis

In honor of the Olympics starting is an Analytical Dashboard I created to compare and contrast the medaling countries from 1976 to 2008.  Enjoy

Wednesday, May 30, 2012

Cohort Analysis using Pentaho

Sorry for the quality...I am working on a better quality video...Thanks!

Wednesday, May 02, 2012

Come hear me present Pentaho (sneak peak to my next use case)

If you are interested in seeing what I have been up to lately, please feel free to join me at 9:00 AM EST or 1:00 PM EST:

Sunday, April 29, 2012

Big Data, Business Analytics, Rich Visualizations, it's all here!

This week I will have the honor to show you guys something that I think is pretty amazing...and seeing how many people are adopting our technology, many of you find amazing as well.  This week Thursday, May 3rd, I will be hosting two events, showing the power and capability of Pentaho.  So what does Big Data, Business Analytics and Rich Visualizations have in common? will just have to register to find out!

I will give you one clue will get to learn this while simultaneously learning about a compelling, real world example of Business Intelligence

Date and Time

Thursday May 3, 2012
9:00 AM ET / 14:00 GMT
10:00 AM PT / 1:00 PM ET

Register Today


Donna Prlich
Director, Product & Solutions Marketing
Pentaho Corporation

Wayne Johnson
Senior Sales Engineer
Pentaho Corporation

Wednesday, April 25, 2012

Pentaho 4.5 is here

This is a very short blog and I will have more information coming very soon but you HAVE GOT to check this out...Pentaho 4.5 has new visualizations that really help build very useful analytics...use cases coming if you want to see me in action, I will be doing worldwide webinars...see below:

  • Visit to learn more about the new release and why ‘Better Together’ is best.
  • Download a free 30-day fully supported trial of the Pentaho Business Analytics 4.5.
  • Attend the live demo of what’s new in Pentaho Business Analytics 4.5 on May 3, 2012. Register at

Monday, March 19, 2012

Processing Zip Files using Pentaho Data Integration

Recently I was on a call and the requirement was that they received zip files that contain a .csv file from their customers and needed to take the contained .csv file and put it into a database.  In this particular example, the structure of the .csv file is already known.  Currently this process is a manual process where they unzip the file, take that file and run it into a transformation that populates it into a database.  Their goal is to automate this process.  In the example below I have created a transformation that shows how Pentaho Data Integration can automatically take a zip file, unzip it, pass the contents to a transformation, process the file into a database.

The first step is to create a Job that handles the unzipping of the file

The job then passes that file to a transformation that takes the enclosed .csv file, processes it and inserts it into a database:

I have created a file for you to download a working example.  In order to run through this example you need to perform the following:

  1. Download the “PDI - UNZIP AND PROCESS FILE.ZP” file and extract its’ contents into a directory called “PDI - UNZIP AND PROCESS FILE”

  2. Open up Pentaho Data Integration (do not connect to the repository)

  3. Open the following job: “unzip.kjb”

  4. Open the following transformation: “populated database from unzipped file.ktr”

  5. In the transformation, open up the “Table Output Step”

  6. Select the “SQL” button and then select “Execute”, then click “ok” and “close” on the next two dialog screens.
    1. NOTE – We are using a database that comes with Pentaho for testing purposes

  7. Now you can go to the job and run the job.  The following will happen:

    1. The zip file contents will placed in the main directory “PDI - UNZIP AND PROCESS FILE.ZP” and be called sales_data_{timestamp}.csv
    2. The job will then take the location of that .csv file and pass it to the transformation, which will take that file and insert it into the database

Wednesday, February 22, 2012

Creating a Dynamic Target Range using Pentaho Report Designer and a Line Chart

Here is a quick example of how you can easily create a line chart with the ability to dynamically set a target range on the chart.  Here is what the end results look like:

How do you do this?  Well it is really pretty straight forward.  First you create your connection and retrieve the data you want to graph.  Then drag a chart object to your report and set the category-column, value-column, and Series-by-value to your data.  Once you have your chart reading your data set, next you will create your parameters, here is an example of the lower range parameter

Once you have your parameters created so that end users can dynamically set the range, all you need to do is add a post-processing script, you can use the script below (I've highlighted the parameters as well):

import java.awt.Color;
import java.awt.Dimension;
import java.awt.Font;
import java.awt.GradientPaint;

import javax.swing.JPanel;

import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.axis;
import org.jfree.chart.axis.CategoryAxis;
import org.jfree.chart.axis.CategoryLabelPositions;
import org.jfree.chart.axis.NumberAxis;
import org.jfree.chart.labels.ItemLabelAnchor;
import org.jfree.chart.labels.ItemLabelPosition;
import org.jfree.chart.labels.StandardCategoryItemLabelGenerator;
import org.jfree.chart.plot.CategoryPlot;
import org.jfree.chart.plot.IntervalMarker;
import org.jfree.chart.plot.PlotOrientation;
import org.jfree.chart.renderer.category.BarRenderer;
import org.jfree.text.TextUtilities;
import org.jfree.ui.ApplicationFrame;
import org.jfree.ui.Layer;
import org.jfree.ui.RectangleAnchor;
import org.jfree.ui.RefineryUtilities;
import org.jfree.ui.TextAnchor;

CategoryPlot plot = chart.getCategoryPlot();
IntervalMarker target = new IntervalMarker(dataRow.get("range1"), dataRow.get("range2"));
target.setLabel("Target Range/Goal");
target.setLabelFont(new Font("SansSerif", Font.ITALIC, 11));
target.setPaint(new Color(222, 222, 255, 128));
plot.addRangeMarker(target, Layer.BACKGROUND);

return chart;

That is really all it takes!  If you want to see a working example of this, feel free to download the sample .prpt file here.


Wednesday, January 25, 2012

Utilize existing SQL Code for your Report Generation with Pentaho

I often get asked if developers can utilize SQL that they already use in their legacy reports within Pentaho and the answer is always a resounding YES!  There are hardly if any companies out there that have not leveraged some type of Business Intelligence, even if it is generating csv files from a SQL query within a database.  So there is a common concern when making any switch in technology, especially Business Intelligence/Analytics, and that is leveraging the work you have already done.  here is a short video demonstration that show how easy it is to leverage Pentaho Report Designer to simply cut and paste your existing SQL code and how to extend that by parameterizing the report.  Enjoy the video!

Friday, January 20, 2012

Increasing Pentaho's Data Sampling Set to Sample your entire CSV File

Pentaho offers a very valuable capability in that you can quickly and easily prototype your data using a CSV file as your source data for the prototype.  Pentaho will bring in the CSV file, automatically profile the data and set the data types and lengths.  One important thing to note here, however, is the default sampling list is set to 200 rows.  I would recommend that you increase this limit to the size of your CSV file.  For example, if you have a CSV file that contains 10,000 rows, I would increase the limit to 10,000.  The next obvious question is how do I do this?  Well, it really is easy, just follow these simple steps:

  1. Open the following file in a text editor:   C:\Program Files\pentaho\server\biserver-ee\pentaho-solutions\system\data-access\settings.xml
  2. Change the default value that is located in the following tag to the desired sampling amount:      data-access-csv-sample-rows>10000/data-access-csv-sample-rows> (in this tag I have increased it to 10000)
  3. Restart your BI Server

This will help ensure that you have a success production of your prototype so that you can show your end users how easy Pentaho is to use to slice and dice their data, build reports and dashboards.

Tuesday, January 03, 2012

Exception Reporting Can Dramatically Save Time and Money

Business Intelligence has been around for a very long time and all companies have at least some form of it, whether it is a written ledger or whether they are using the most cutting edge, advanced and innovative BI solution such as Pentaho ;).  That being said, Exception Reporting is still not widely used in today's business.  What exactly is exception reporting?  I am glad you asked.

Exception reporting is a method of reporting that, well, reports on the exceptions, which is A person or thing that is excluded from a general statement or does not follow a rule.  One of the key words is "rule", so to really be able to effectively do exception reporting, a business must have some key business rules in place in order to know what exception they are looking for, here are a couple of industry specific examples:

Customer Service Example

Have you ever eaten fast food?  I am sure you have and you probably have noticed when going through a drive through that many of them have a timer clock by the window showing the current wait time.  Why is this?  This is because the company has deemed it important to monitor what the average wait time is for their customers.  The smaller the average the higher the customer satisfaction...hence the food.  The timer is there for the benefit of the line workers.  Shift managers are well aware of what the acceptable average wait times are and are often compensated for staying within range or even beating expectations.  What exception reporting does is collects all the information and presents it from a summary down to a granular basis.  Follow me here...Somewhere, back at corporate there is a person who is responsible for managing what the average wait time is for their customers among other things.  Now this person cannot sit there just staring at a dashboard that says what the average wait time is, so what they have done is setup some exception reporting.  To do this, they have defined the acceptable limits in which the company allows.  For example, an acceptable wait time is anywhere between 30 seconds and 3 minutes.  Any time the average falls above that range, the responsible party is notified and is offered the ability to click on the number and receive a list of only those stores that are above the limits along with what their average wait time currently is.

Financial Example

Financial institutions commonly have very strict regulations that they must abide by and these regulations change often.  Due to this, they often have implement a vast array of different information systems needed to meet these regulations.  The problem is that now they have the burden of making sure that these systems are in balance.  Any time you increase the complexity and diversity of all these information systems, you typically run into a data integration nightmare.  You have data spread out among many systems, some of them duplicated and some not.  In regards to the data that must be duplicated, it is important that those systems be able to reconcile with each other.  Recently, I helped a company be able to do this.  Their current process was to receive a report from one of their financial systems via a PDF, due to the system being a SaaS application with certain limitations and reconcile that with a legacy system they have.  It was a few people job to look at this report that was over 2,000 pages and compare account balances across both systems.  When the balances did not match, they had to look at the detail information and make any appropriate journal entries in order to correct it.  Since neither of these systems "talked" to each other, this was a very hands on, manual process.  Well, a little bit of know how and of course Pentaho, they now are able to connect to both systems and compare balances automatically and create a report that only contains accounts that don't reconcile with drill down to detail capability...saving days of several FTEs (full time employees).

Pentaho is rather unique in that the way our Data Integration and Business Analytics are architected, performing exception reporting is well exceptional.  My apologies for such a corny ending...but...I am a father of has to have a sense of humor at this point ;)

BTW - Happy New Years to all my blog followers, I appreciate all your views!