Pentaho BI (10 Blogs) Become a Certified Professional

Introduction to Pentaho Data Integration

Last updated on Apr 28,2020 4.8K Views

The Pentaho Data Integration is intended to Extract, Transform, Load (ETL) mainly. It consists of the following elements:

DI Server (Server Application)

Data integration server executes jobs and transformations using PDI engine. It has default user and role-based security and can also be integrated with existing LDAP/ Active Directory security provider. Here, we can store the transformations and jobs stored at one common place.

Design Tool (standalone) – It is for designing jobs and transformations

Spoon – GUI Tool to develop all jobs & transformations

Kitchen – Tool to run any job & transformations

Pan – Tool to run just the transformations

Carte – Remote ETL Server

In data warehouse, historical data is loaded at one go and historical data is available with the organization. On a daily basis since we won’t be able to run the entire data repeatedly into the data warehouse, we go forward with the incremental load.

The incremental load involves loading any changed data from the source site. It’s important to know that we won’t be able to sit or run the job & transformation manually everyday so we must schedule the job.  We schedule it on a weekly basis using windows scheduler and it runs the particular job on a specific time in order to run the incremental data into the data warehouse. This is known as the command prompt feature of PDI (Pentaho Data Integration).

Data Connections  –  Which is used for making connection from source to target database.

Transformation – It works on extracting and loading data into data warehouse.

What is Spoon?

It’s a GUI tool for developing jobs and transformations. It is easy to learn and is user friendly. There is a transformation already opened under the name ‘DIM_Product’. On the left side there are two tabs called View and Design.  Here, we build a Database Connection to get data or load data from datawarehouse. In the design tab we have different nodes such as:

Input – Where we need to extract the data.

Output – In order to load data.

Transform – Which involves connectors and logic.

Got a question for us? Mention them in the comments section and we will get back to you. 

Pentaho BI Training


Join the discussion

Browse Categories

webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.