Business Intelligence in Real Life
Wednesday, January 25, 2012
Utilize existing SQL Code for your Report Generation with Pentaho
I often get asked if developers can utilize SQL that they already use in their legacy reports within Pentaho and the answer is always a resounding YES! There are hardly if any companies out there that have not leveraged some type of Business Intelligence, even if it is generating csv files from a SQL query within a database. So there is a common concern when making any switch in technology, especially Business Intelligence/Analytics, and that is leveraging the work you have already done. here is a short video demonstration that show how easy it is to leverage Pentaho Report Designer to simply cut and paste your existing SQL code and how to extend that by parameterizing the report. Enjoy the video!
Friday, January 20, 2012
Increasing Pentaho's Data Sampling Set to Sample your entire CSV File
Pentaho offers a very valuable capability in that you can quickly and easily prototype your data using a CSV file as your source data for the prototype. Pentaho will bring in the CSV file, automatically profile the data and set the data types and lengths. One important thing to note here, however, is the default sampling list is set to 200 rows. I would recommend that you increase this limit to the size of your CSV file. For example, if you have a CSV file that contains 10,000 rows, I would increase the limit to 10,000. The next obvious question is how do I do this? Well, it really is easy, just follow these simple steps:
- Open the following file in a text editor: C:\Program Files\pentaho\server\biserver-ee\pentaho-solutions\system\data-access\settings.xml
- Change the default value that is located in the following tag to the desired sampling amount:
data-access-csv-sample-rows>10000/data-access-csv-sample-rows> (in this tag I have increased it to 10000)
- Restart your BI Server
This will help ensure that you have a success production of your prototype so that you can show your end users how easy Pentaho is to use to slice and dice their data, build reports and dashboards.
Friday, January 06, 2012
Tuesday, January 03, 2012
Exception Reporting Can Dramatically Save Time and Money
Business Intelligence has been around for a very long time and all companies have at least some form of it, whether it is a written ledger or whether they are using the most cutting edge, advanced and innovative BI solution such as Pentaho ;). That being said, Exception Reporting is still not widely used in today's business. What exactly is exception reporting? I am glad you asked.
Exception reporting is a method of reporting that, well, reports on the exceptions, which is A person or thing that is excluded from a general statement or does not follow a rule. One of the key words is "rule", so to really be able to effectively do exception reporting, a business must have some key business rules in place in order to know what exception they are looking for, here are a couple of industry specific examples:
Customer Service Example
Have you ever eaten fast food? I am sure you have and you probably have noticed when going through a drive through that many of them have a timer clock by the window showing the current wait time. Why is this? This is because the company has deemed it important to monitor what the average wait time is for their customers. The smaller the average the higher the customer satisfaction...hence the name...fast food. The timer is there for the benefit of the line workers. Shift managers are well aware of what the acceptable average wait times are and are often compensated for staying within range or even beating expectations. What exception reporting does is collects all the information and presents it from a summary down to a granular basis. Follow me here...Somewhere, back at corporate there is a person who is responsible for managing what the average wait time is for their customers among other things. Now this person cannot sit there just staring at a dashboard that says what the average wait time is, so what they have done is setup some exception reporting. To do this, they have defined the acceptable limits in which the company allows. For example, an acceptable wait time is anywhere between 30 seconds and 3 minutes. Any time the average falls above that range, the responsible party is notified and is offered the ability to click on the number and receive a list of only those stores that are above the limits along with what their average wait time currently is.
Financial Example
Financial institutions commonly have very strict regulations that they must abide by and these regulations change often. Due to this, they often have implement a vast array of different information systems needed to meet these regulations. The problem is that now they have the burden of making sure that these systems are in balance. Any time you increase the complexity and diversity of all these information systems, you typically run into a data integration nightmare. You have data spread out among many systems, some of them duplicated and some not. In regards to the data that must be duplicated, it is important that those systems be able to reconcile with each other. Recently, I helped a company be able to do this. Their current process was to receive a report from one of their financial systems via a PDF, due to the system being a SaaS application with certain limitations and reconcile that with a legacy system they have. It was a few people job to look at this report that was over 2,000 pages and compare account balances across both systems. When the balances did not match, they had to look at the detail information and make any appropriate journal entries in order to correct it. Since neither of these systems "talked" to each other, this was a very hands on, manual process. Well, a little bit of know how and of course Pentaho, they now are able to connect to both systems and compare balances automatically and create a report that only contains accounts that don't reconcile with drill down to detail capability...saving days of several FTEs (full time employees).
Pentaho is rather unique in that the way our Data Integration and Business Analytics are architected, performing exception reporting is well exceptional. My apologies for such a corny ending...but...I am a father of 5...one has to have a sense of humor at this point ;)
BTW - Happy New Years to all my blog followers, I appreciate all your views!
Exception reporting is a method of reporting that, well, reports on the exceptions, which is A person or thing that is excluded from a general statement or does not follow a rule. One of the key words is "rule", so to really be able to effectively do exception reporting, a business must have some key business rules in place in order to know what exception they are looking for, here are a couple of industry specific examples:
Customer Service Example
Have you ever eaten fast food? I am sure you have and you probably have noticed when going through a drive through that many of them have a timer clock by the window showing the current wait time. Why is this? This is because the company has deemed it important to monitor what the average wait time is for their customers. The smaller the average the higher the customer satisfaction...hence the name...fast food. The timer is there for the benefit of the line workers. Shift managers are well aware of what the acceptable average wait times are and are often compensated for staying within range or even beating expectations. What exception reporting does is collects all the information and presents it from a summary down to a granular basis. Follow me here...Somewhere, back at corporate there is a person who is responsible for managing what the average wait time is for their customers among other things. Now this person cannot sit there just staring at a dashboard that says what the average wait time is, so what they have done is setup some exception reporting. To do this, they have defined the acceptable limits in which the company allows. For example, an acceptable wait time is anywhere between 30 seconds and 3 minutes. Any time the average falls above that range, the responsible party is notified and is offered the ability to click on the number and receive a list of only those stores that are above the limits along with what their average wait time currently is.
Financial Example
Financial institutions commonly have very strict regulations that they must abide by and these regulations change often. Due to this, they often have implement a vast array of different information systems needed to meet these regulations. The problem is that now they have the burden of making sure that these systems are in balance. Any time you increase the complexity and diversity of all these information systems, you typically run into a data integration nightmare. You have data spread out among many systems, some of them duplicated and some not. In regards to the data that must be duplicated, it is important that those systems be able to reconcile with each other. Recently, I helped a company be able to do this. Their current process was to receive a report from one of their financial systems via a PDF, due to the system being a SaaS application with certain limitations and reconcile that with a legacy system they have. It was a few people job to look at this report that was over 2,000 pages and compare account balances across both systems. When the balances did not match, they had to look at the detail information and make any appropriate journal entries in order to correct it. Since neither of these systems "talked" to each other, this was a very hands on, manual process. Well, a little bit of know how and of course Pentaho, they now are able to connect to both systems and compare balances automatically and create a report that only contains accounts that don't reconcile with drill down to detail capability...saving days of several FTEs (full time employees).
Pentaho is rather unique in that the way our Data Integration and Business Analytics are architected, performing exception reporting is well exceptional. My apologies for such a corny ending...but...I am a father of 5...one has to have a sense of humor at this point ;)
BTW - Happy New Years to all my blog followers, I appreciate all your views!
Wednesday, November 16, 2011
Reporting from Multiple Disparate Data Sources
One of the many powerful features of Pentaho is the ability for Pentaho Reporting and Pentaho Data Integration to seamlessly work together. This example will walk through a typical use case within many organizations. Let's say you want to report of multiple disparate data sources. You may have a database, a csv file, and a flat file that you want to be able to join the data together and report from but you don't want to have to combine this data in a data store. Additionally, you may want to give the end users the ability to sort this data.
USE CASE - Reporting from Raw Call Log Summary Information
I have a client who has accesses to some summary telephony data that records the outbound calling of their agents. Besides having a record of these outbound calls, they have another solution that also records the inbound calls. This inbound system reporting is a separate from their outbound system in terms of the storage of that data. Both outbound and inbound record sets records the same information, AgentID, CallType, and Duration. The outbound data file CallType column all contain the same CallType of outbound and the inbound data file has the value of inbound. They want to merge these disparate files and also lookup the AgentID from another source that lists all their agents with their corresponding IDs, this way on their report they can list the agent name instead of their ID.
SOLUTION - Pentaho Data Integration and Pentaho Report Designer
Pentaho Data Integration
What I have done is use PDI to bring all the data together and make it available for Report Designer. The transformation is made up of the following steps:
- CSV file input - This is the outbound call records
- Excel Input - This is the inbound call records
- Append Streams - This merges the outbound and inbound records into one data set
- Stream Lookup - This step looks up the value of AgentID in order to get the Agent Name
- CSV file input 2 - This step contains the AgentID along with the corresponding Agent Name
- Sort rows - This sorts the rows by Agent Name
- Unique Rows - This returns a list of Unique Agent Names use in the parameter list within our report
- Create Parameter - This defines a parameter that will be used to filter the results
- Filter rows - This will use the parameter to filter out the desired results
- Filtered Results - Displays all the desired results
- Non Matching Results - Displays all the results that were not selected
Once this transformation is created then it needs to be exported as an .ktr (xml) file and is now ready to be consumed by Report Designer.
Pentaho Report Designer
Once in Report Designer I can create a report and add a Pentaho Data Integration data source. All I will need to do is point to that transformation location (this is the .ktr file that you exported from Pentaho Data Integration). When you bring that transformation into Report Designer, it will ask you to select which step to utilize. This give you the ability to report from any point within the data integration step listed above. For this example, we will consume information from the Filtered Results step. In this report, we are also building a parameter which allows end users of this report to select which Agent they want to view the data for. We are dynamically building this list by selecting the Unique Rows step in our data integration transformation.
In order to get this example to work in your environment, you will need to download the files here, these files include:
- Multiple Sources with Parameter.ktr - Transformation
- Sample Report from three Sources with Parm.prpt - Report
- outbound.csv - Contains outbound call log data
- inbound.xls - Contains inbound call log data
- agentLookup.csv - Contains Agent Name lookup by AgentID Data
Make the following modifications to the Multiple Sources with Parameter.ktr file:
- Open the Multiple Sources with Parameter.ktr by going into Pentaho Data Integration and selecting File -> Import from an XML File
- Edit the following steps and point them to the files you downloaded:
- CSV file input - point to outbound.csv
- Excel Input - point to inbound.xls
- CSV file input 2 - point to agentLookup.csv
- Save transformation (this will save it to your repository)
- Export the transformation to the location where you saved the original (you can overwrite the original)
Make the following changes in the Sample Report from three Sources with Parm.prpt in Report Designer:
- Open the Report in Report Designer
- Click on the "Data" tab on the right side of Report Designer
- Double Click "Agent List" under the Pentaho Data Integration Data Set, browse to your recently saved and exported transformation and select the "Unique rows" step in the steps list.
- Double Click "Query 1" under the Pentaho Data Integration Data Set, browse to your recently saved and exported transformation and select the "Filtered Results" step in the steps list.
- Click on the preview button to see it work.
Keep in mind that if you want to publish this report to your BI Server you will have to bring the Multiple Sources with Parameter.ktr as a report resource, for more information on this, please see the Using a Pentaho Data Integration Data Source with the Pentaho Report Designer blog entry I did back in May
Monday, November 14, 2011
Application Integration using Pentaho Data Integration
Last week I was on a call where the person wanted to be able to easily access their data base and make that data available to other applications via both xml and Json. Some of their other applications require XML but they have a few that also use Json and as an organization, they are moving toward consuming Json as well. This will show an example on how you can easily use Pentaho Data Integration to gain access to any data source (for this example we will use the sample database that comes with Pentaho).
Use Case Example - Customer Payment Information
Currently Acme Company has all of their customer payment information in a database and they want to expose that information to a new application financial forecasting system that will help them better forecast:
- How likely a particular customer is in making future payments
- Segment customer based on their Accounts Receivables Turnover
- What is the trend in Accounts Receivables Turnover
This new financial forecasting application can accept data via either XML or Json. The first step is to build a simple transformation made up of three steps.
- Table Input Step - This step will pull back data from the database
- XML Output Step - This step will convert the data into a XML format
- Json Output Step - This step will convert the data into a Json format
The answer to how Pentaho can do this is similar to the famous quote from Fried Green Tomatoes, "The secret is in the sauce." In both the XML Output and the Json Output, Pentaho has a very cool feature that is enabled by a simple check box, it is the "Pass output to servlet" checkbox. In other words, this exposes Pentaho Data Integration data over a web service. Matt Casters has also documented this on his blog here.
In order to test the example, simply import the .ktr in the link below into Pentaho Data Integration and save it to your Pentaho Data Integration Enterprise Repository and access via a simple url call:
http://localhost:9080/pentaho-di/kettle/executeTrans/?rep=Enterprise%20Repository&user=joe&pass=password&trans=/home/joe/json1
The transformation gives you two options for XML or Json Output. To switch between the two different output formats, open the transformation and enable/disable the appropriate hop.
Download Transformation Here
Tuesday, October 18, 2011
Use Case: Grouping by Time, A Call Center Story
Recently a company that operates a rather large call center came to Pentaho to help them be able to better analyze their data and gain better insight to the performance and efficiency of their agents. One of the issues they had was that they wanted to see their call center data grouped by particular time categories so that they can better manage staffing for peak hours of the day as well as view who are the top performing agents in terms of volume. They had some basic information but the problem was that they wanted to have the grouping within the database so that they can gain some additional performance in pre-aggregating the data, due to the large volume of calls that are occurring. Here is a sample of what the data looked like (the sample below is only sample data generated by a sample data generator).
The first step in this process is to make the data "BI Ready", in other words take the data and convert to information by enriching it and grouping it by the designated time buckets, for this particular example they want to bucket their time in 15 minute increments starting from 8:00 to 5:00 (the operating hours of the call center). In this example, I am only using a snapshot of data ranging from 8:00 AM to 9:00 AM. This will allow them to answer questions like
- During what time span (within 15 minutes) is the call volume the highest?
- What time categories have the highest volume of incoming calls?
- What time categories have the highest volume of outbound calls?
The end transformation looks like this:
Transformation Detailed Description
Read Call Center Data from CSV - This is a csv Input Step that reads in the data from the CSV file. This does not have to be a csv file but can be almost any flat file, database, web service, etc.
Convert Date-Time to String - This is a Select Values step that allows you to rename or change the type of any of your fields, in this step we are converting the date-time to a string in order to split it into two different fields, one field that contains only date and the other only time.
Split Date-Time Field - Field Splitter allows you to split any string field based on a common requirement. In this example because the date-time field called BeginDate in our csv file has a date-time field that is a format where the date and time are separated by a space, we use that as the delimiter to split the field.
Convert Time to Number - The convert time to number is used to convert the Time field that we created in the previous step to a number in order to do our grouping into time ranges.
Create Time Range - This step will create a new column called TimeGroup, read in the rows of data, look at the time field set the appropriate value to the newly created field. For this example we are creating four groups: 8:00 AM to 8:15 AM, 8:15 AM to 8:30 AM, 8:30 AM to 8:45 AM, and 8:45 AM to 9:00 AM.
Convert Time to Time Format - This step converts the time field back to the time data type for reporting purposes.
Dummy (do nothing) - this step, as it implies does nothing, at this point you can load the data directly into a database, flat file, etc.
The end result of this transformation looks like this:
Download the zip file here for youto run this transformation. In order for it to work, you will need to import the .ktr file and then make sure to change the path of the csv file in the first step to the location of where you saved the csv file that is included in the .zip file.
Pentaho Data Integration Sample Library
When you download Pentaho and run our Data Integration
client tool (also known as Spoon), you also get a huge library of sample
transformations that can help you build out your solution. The great thing about these sample
transformation and jobs is that they are all working samples, so you can test
and learn by following the entire flow.
There are over 150 working sample transformation and over 40 working
jobs. You can access these
transformation and jobs anytime by going to the following directory:
[Install Directory]\pentaho\design-tools\data-integration\samples
To use any of these transformation in Pentaho Data
Integration follow these three easy steps:
- Start Pentaho Data Integration Client Tool (Spoon)
- Go to File > Import from an XML File

- Browse to the samples library location stated above and select the sample you wish to view and run.
Subscribe to:
Posts (Atom)


