Friday, January 20, 2012

Increasing Pentaho's Data Sampling Set to Sample your entire CSV File

Pentaho offers a very valuable capability in that you can quickly and easily prototype your data using a CSV file as your source data for the prototype.  Pentaho will bring in the CSV file, automatically profile the data and set the data types and lengths.  One important thing to note here, however, is the default sampling list is set to 200 rows.  I would recommend that you increase this limit to the size of your CSV file.  For example, if you have a CSV file that contains 10,000 rows, I would increase the limit to 10,000.  The next obvious question is how do I do this?  Well, it really is easy, just follow these simple steps:


  1. Open the following file in a text editor:   C:\Program Files\pentaho\server\biserver-ee\pentaho-solutions\system\data-access\settings.xml
  2. Change the default value that is located in the following tag to the desired sampling amount:      data-access-csv-sample-rows>10000/data-access-csv-sample-rows> (in this tag I have increased it to 10000)
  3. Restart your BI Server

This will help ensure that you have a success production of your prototype so that you can show your end users how easy Pentaho is to use to slice and dice their data, build reports and dashboards.

No comments: