Wednesday, November 16, 2011

Reporting from Multiple Disparate Data Sources

One of the many powerful features of Pentaho is the ability for Pentaho Reporting and Pentaho Data Integration to seamlessly work together.  This example will walk through a typical use case within many organizations.  Let's say you want to report of multiple disparate data sources.  You may have a database, a csv file, and a flat file that you want to be able to join the data together and report from but you don't want to have to combine this data in a data store.  Additionally, you may want to give the end users the ability to sort this data.

USE CASE - Reporting from Raw Call Log Summary Information

I have a client who has accesses to some summary telephony data that records the outbound calling of their agents.  Besides having a record of these outbound calls, they have another solution that also records the inbound calls.  This inbound system reporting is a separate from their outbound system in terms of the storage of that data.  Both outbound and inbound record sets records the same information, AgentID, CallType, and Duration.  The outbound data file CallType column all contain the same CallType of outbound and the inbound data file has the value of inbound.  They want to merge these disparate files and also lookup the AgentID from another source that lists all their agents with their corresponding IDs, this way on their report they can list the agent name instead of their ID.

SOLUTION - Pentaho Data Integration and Pentaho Report Designer

Pentaho Data Integration
What I have done is use PDI to bring all the data together and make it available for Report Designer.  The transformation is made up of the following steps:

  1. CSV file input - This is the outbound call records
  2. Excel Input - This is the inbound call records
  3. Append Streams - This merges the outbound and inbound records into one data set
  4. Stream Lookup - This step looks up the value of AgentID in order to get the Agent Name
  5. CSV file input 2 - This step contains the AgentID along with the corresponding Agent Name
  6. Sort rows - This sorts the rows by Agent Name
  7. Unique Rows - This returns a list of Unique Agent Names use in the parameter list within our report
  8. Create Parameter - This defines a parameter that will be used to filter the results
  9. Filter rows - This will use the parameter to filter out the desired results
  10. Filtered Results - Displays all the desired results
  11. Non Matching Results - Displays all the results that were not selected



Once this transformation is created then it needs to be exported as an .ktr (xml) file and is now ready to be consumed by Report Designer.

Pentaho Report Designer

Once in Report Designer I can create a report and add a Pentaho Data Integration data source.  All I will need to do is point to that transformation location (this is the .ktr file that you exported from Pentaho Data Integration).  When you bring that transformation into Report Designer, it will ask you to select which step to utilize.  This give you the ability to report from any point within the data integration step listed above.  For this example, we will consume information from the Filtered Results step.  In this report, we are also building a parameter which allows end users of this report to select which Agent they want to view the data for.  We are dynamically building this list by selecting the Unique Rows step in our data integration transformation.

In order to get this example to work in your environment, you will need to download the files here, these files include:
  • Multiple Sources with Parameter.ktr - Transformation
  • Sample Report from three Sources with Parm.prpt - Report
  • outbound.csv - Contains outbound call log data
  • inbound.xls - Contains inbound call log data
  • agentLookup.csv - Contains Agent Name lookup by AgentID Data



Make the following modifications to the Multiple Sources with Parameter.ktr file:
  1. Open the Multiple Sources with Parameter.ktr by going into Pentaho Data Integration and selecting File -> Import from an XML File
  2. Edit the following steps and point them to the files you downloaded:
    1. CSV file input - point to outbound.csv
    2. Excel Input - point to inbound.xls
    3. CSV file input 2 - point to agentLookup.csv
  3. Save transformation (this will save it to your repository)
  4. Export the transformation to the location where you saved the original (you can overwrite the original)



Make the following changes in the Sample Report from three Sources with Parm.prpt in Report Designer:

  1. Open the Report in Report Designer
  2. Click on the "Data" tab on the right side of Report Designer
  3. Double Click "Agent List" under the Pentaho Data Integration Data Set, browse to your recently saved and exported transformation and select the "Unique rows" step in the steps list.
  4. Double Click "Query 1" under the Pentaho Data Integration Data Set, browse to your recently saved and exported transformation and select the "Filtered Results" step in the steps list.
  5. Click on the preview button to see it work.



Keep in mind that if you want to publish this report to your BI Server you will have to bring the Multiple Sources with Parameter.ktr as a report resource, for more information on this, please see the Using a Pentaho Data Integration Data Source with the Pentaho Report Designer blog entry I did back in May





Monday, November 14, 2011

Application Integration using Pentaho Data Integration

Last week I was on a call where the person wanted to be able to easily access their data base and make that data available to other applications via both xml and Json.  Some of their other applications require XML but they have a few that also use Json and as an organization, they are moving toward consuming Json as well.  This will show an example on how you can easily use Pentaho Data Integration to gain access to any data source (for this example we will use the sample database that comes with Pentaho). 

Use Case Example - Customer Payment Information

Currently Acme Company has all of their customer payment information in a database and they want to expose that information to a new application financial forecasting system that will help them better forecast:
  • How likely a particular customer is in making future payments
  • Segment customer based on their Accounts Receivables Turnover
  • What is the trend in Accounts Receivables Turnover
This new financial forecasting application can accept data via either XML or Json.  The first step is to build a simple transformation made up of three steps.
  1. Table Input Step - This step will pull back data from the database
  2. XML Output Step - This step will convert the data into a XML format
  3. Json Output Step - This step will convert the data into a Json format

The answer to how Pentaho can do this is similar to the famous quote from Fried Green Tomatoes, "The secret is in the sauce."  In both the XML Output and the Json Output, Pentaho has a very cool feature that is enabled by a simple check box, it is the "Pass output to servlet" checkbox.  In other words, this exposes Pentaho Data Integration data over a web service.  Matt Casters has also documented this on his blog here.

In order to test the example, simply import the .ktr in the link below into Pentaho Data Integration and save it to your Pentaho Data Integration Enterprise Repository and access via a simple url call:

http://localhost:9080/pentaho-di/kettle/executeTrans/?rep=Enterprise%20Repository&user=joe&pass=password&trans=/home/joe/json1

The transformation gives you two options for XML or Json Output.  To switch between the two different output formats, open the transformation and enable/disable the appropriate hop.

NOTE: This url call is specific to a generic installation of Pentaho Enterprise Edition.  The default installation port number is 9080, yours may vary depending on your installation

Download Transformation Here