The first step is to create a Job that handles the unzipping of the file
The job then passes that file to a transformation that takes the enclosed .csv file, processes it and inserts it into a database:
I have created a file for you to download a working example. In order to run through this example you need to perform the following:
- Download the “PDI - UNZIP AND PROCESS FILE.ZP” file and extract its’ contents into a directory called “PDI - UNZIP AND PROCESS FILE”
- Open up Pentaho Data Integration (do not connect to the repository)
- Open the following job: “unzip.kjb”
- Open the following transformation: “populated database from unzipped file.ktr”
- In the transformation, open up the “Table Output Step”
- Select the “SQL” button and then select “Execute”, then click “ok” and “close” on the next two dialog screens.
- NOTE – We are using a database that comes with Pentaho for testing purposes
- Now you can go to the job and run the job. The following will happen:
- The zip file contents will placed in the main directory “PDI - UNZIP AND PROCESS FILE.ZP” and be called sales_data_{timestamp}.csv
- The job will then take the location of that .csv file and pass it to the transformation, which will take that file and insert it into the database