Thursday, July 18, 2013

A Brokerage Firm and Pentaho

Monday, April 29, 2013

The Dirty Truth About Data and How To Clean It Using Pentaho!

 Anyone who has worked with data has been there.  You are trying to bring data into your organization in order to merge it with other data so that you can provide a complete picture of:
  • Your Organization
  • Your Customers
  • Your Industry
  • How all the above relate together
In order to achieve this complete picture, it will require you to rely on data that originated and exists outside of your organization.  Some examples may be bringing data in from Twitter, Facebook, LinkedIn, YouTube, Etc.  While we all know that the data within our own organizations is always clean ;) we all know that the data external is usually full of "bad" or "dirty" data.  What this demonstration will do for you is show you how you can use the power of Pentaho Data Integration to help clean your data as you merge, enrich and analyze it.

Setting the Stage

In this example, I am going to consume information from a flat file (csv) that has been provided to me through a third party vendor that I am paying to do sentiment analysis on my products.  This fictitious company, called Big Wireless, is a company that sells wireless products (cell phones, tablets, notebooks, etc) and services (cell phone, home line, etc).

The purpose of this exercise is to bring the data that is being provided by this third party (which I receive on a daily basis).  When processing the data, I need to capture any records that have bad or malformed data and report this back to the third party vendor.  In other words, I am paying for a service from them and this lets me verify that I am getting what I am paying for and can use this to make sure that they are living up to their QOS.  

Below is a recorded demonstration of the following (based on the information above):

  1. Read in the CSV file from my 3rd Party Vendor
  2. Keep track of any "dirty" data
  3. Validating the expected Sentiment
  4.  Doing a fuzzy lookup in order to standardize on my companies product names
  5. Enriching the data through several lookups"
    1. Look up Detailed Product Information
    2. Lookup Geocode on where the tweet originated
  6. Create some new time dimensions
  7. Put it in my data base for further analysis
(Please excuse the tunnel voice effect :)


Monday, April 22, 2013

The Future of Business Analytics Changes Today!

Pentaho Acquires Dashboard and UI Specialist Partner Webdetails
Portugal-based consultancy provides visual development expertise, consulting services and a new community leader

  • Pentaho is hiring and seeking superstars worldwide. Visit our careers page to learn more.

Orlando, Fla — April 22, 2013 — Delivering the future of analytics, Pentaho announced today that it has completed the acquisition of its Portugal-based consulting partner Webdetails. Pentaho will benefit from Webdetails’ visual interface development expertise and international consulting services provided by its 20-strong team. Webdetails’ founder Pedro Alves is a high-profile member of Pentaho’s open source community and will take on the new role of Senior VP, Community for Pentaho. As both parties are privately-held, the financial terms of the deal are undisclosed.

Raising Pentaho’s “visibility”

The Webdetails acquisition will complement and accelerate Pentaho’s research and development plans to enrich the user experience for both IT and business users of its business analytics platform and big data integration tools. This will include expanding Pentaho’s range of data visualizations available in dashboards, making visual development tools like Instaview even easier to use and delivering new visual interfaces to help new customers get started.

Webdetails has been designing plug-ins for Pentaho for several years, most notably its Community Tools or “CTools” series for creating and managing dashboards and reports. Last November, Webdetails collaborated with Pentaho to launch the Pentaho Marketplace, a destination on github where developers can share, install and load cool plug-ins.

Meeting growing demand for Pentaho’s consulting services

As demand for advanced, big data and embedded analytics services continues to soar, Webdetails provides an experienced, international team to bolster Pentaho’s existing consulting services. Webdetails provides services worldwide, with most of the revenue stream coming from US and Europe. List of clients include 4SightBI, St. Antonius Hospital, and Pentaho’s award-winning customer Stonegate Senior Living.

Redoubling community support

In addition to continuing his role as General Manager for Webdetails’, founder Pedro Alves will take on the new role of Senior VP, Community. In this latter role, Alves will be the chief advocate and interface to Pentaho’s active open source developer community.

Doug Johnson, EVP and COO, Pentaho commented, “Everything about Webdetails perfectly complements our operations as we continue to scale to meet demand fueled by the big data revolution. Webdetails’ expertise in high-end visualizations brings capabilities to help customers roll out exceptional visualizations with all data sources, particularly in Big Data. With Webdetails joining the Pentaho family we gain visual development talent, international consulting services and a highly respected open source community leader in Pedro.”

Pedro Alves (@pmalves), commented, “After five years as consultants and advocates for Pentaho in the business and open source communities, my team is incredibly proud to be officially joining the company. On a personal note, I am delighted to be taking on the role as community leader and look forward to the opportunity and challenge that this presents.”

Webdetails will continue doing business under its existing brand, but as a Pentaho company.

Sunday, April 14, 2013

Pentaho Big Data Forum - Washington D.C.

Featured Speakers:
  • Michael Lazar, Senior Systems Engineer, Cloudera
  • Will LaForest, Senior Director, 10gen
  • Ruhollah Farchtchi, Director of Federal Systems, Unisys
  • Wayne Johnson, Sales Consultant, Pentaho
  • Will Gorman, VP Chief Architect, Pentaho
  • Matt Casters, PDI Architect, Pentaho

Join Pentaho for a half-day big data forum in Washington D.C. Do not miss out on the opportunity to connect with key Pentaho leaders and hear the latest big data hot topics from our featured partners, Cloudera, 10gen and Unisys.

Tuesday, April 23, 2013
1101 Wilson Blvd.
Arlington, VA 22209

For questions or more information, please contact Laura Tuohy at
Time                            Agenda Item                 
8:00 a.m. - 8:30 a.m.         Breakfast & Registration    
8:30 a.m. - 9:30 a.m.Pentaho Big Data Update
9:30 a.m. - 10:15 a.m.  Cloudera Big Data Presentation   
10:15 a.m. - 10:30 a.m. Pentaho Business Analytics Update & Demo   
10:30 a.m. - 10:45 a.m. Coffee Break
10:45 a.m. -11:30 a.m.10gen Big Data Presentation
11:30 a.m. - 12:00 p.m.Unisys Big Data Presentation
12:00 p.m. - 1:00 p.m.Lunch & Kettle Presentation on PDI for Big Data