Pentaho BI (10 Blogs) Become a Certified Professional

Introduction to Pentaho Data Integration

Last updated on May 22,2019 3.1K Views


myMock Interview Service for Real Tech Jobs


myMock Interview Service for Real Tech Jobs

  • Mock interview in latest tech domains i.e JAVA, AI, DEVOPS,etc
  • Get interviewed by leading tech experts
  • Real time assessment report and video recording

The Pentaho Data Integration is intended to Extract, Transform, Load (ETL) mainly. It consists of the following elements:

DI Server (Server Application)

Data integration server executes jobs and transformations using PDI engine. It has default user and role-based security and can also be integrated with existing LDAP/ Active Directory security provider. Here, we can store the transformations and jobs stored at one common place.

Design Tool (standalone) – It is for designing jobs and transformations

Spoon – GUI Tool to develop all jobs & transformations

Kitchen – Tool to run any job & transformations

Pan – Tool to run just the transformations

Carte – Remote ETL Server

In data warehouse, historical data is loaded at one go and historical data is available with the organization. On a daily basis since we won’t be able to run the entire data repeatedly into the data warehouse, we go forward with the incremental load.

The incremental load involves loading any changed data from the source site. It’s important to know that we won’t be able to sit or run the job & transformation manually everyday so we must schedule the job.  We schedule it on a weekly basis using windows scheduler and it runs the particular job on a specific time in order to run the incremental data into the data warehouse. This is known as the command prompt feature of PDI (Pentaho Data Integration).

Data Connections  –  Which is used for making connection from source to target database.

Transformation – It works on extracting and loading data into data warehouse.

What is Spoon?

It’s a GUI tool for developing jobs and transformations. It is easy to learn and is user friendly. There is a transformation already opened under the name ‘DIM_Product’. On the left side there are two tabs called View and Design.  Here, we build a Database Connection to get data or load data from datawarehouse. In the design tab we have different nodes such as:

Input – Where we need to extract the data.

Output – In order to load data.

Transform – Which involves connectors and logic.

Got a question for us? Mention them in the comments section and we will get back to you. 

Related Post:

Pentaho for Business Analytics

Pentaho BI Training


Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.