Published on Sep 26,2014
Email Post

Introduction to ETL

ETL is Extract, Transform and Load. It is an ETL process to extract data, mostly from different types of systems, transform it into a structure that’s more appropriate for reporting and analysis and finally loading it into the database.

There are multiple disparate sources that are loading the data into the data warehouse. So data needs to be extracted from multiple sources. There is no consistency in the system, thus one has to standardize all the data that is coming in and then load it into the data warehouse. Usually, most of the companies in the banking and industry sector use mainframe systems which are the old systems. They become a complex task for reporting and are now trying to migrate it to the data warehouse system.

Usually in the production environment, the files are extracted from the main systems and are sent to a unique sort of window in the file format. Each file has a specific standard format so they can send multiple files as well depending on the requirement. For example, if the files are sent at 3 am, the files are processed using an ETL tool. Some of the well known ETL tools are Informatica and Datastage which are costly and not open source. Talent is an open source system.  ETL tools are used to remove certain unwanted characters in the log in forums and then the files are loaded into an area called the staging area.

In the staging area, all the business tools are applied. For example, there is a business rule that says, if there is a particular record coming in, it should always be present in the master table record. If it is not present, it will not be moved further and hence, the master will be checked to see if the record is present.

Schedulers are also present to run the records exactly at 3 am or you can run the jobs when the file arrives. There can be time dependency and file dependency. So manual efforts in running the job is very less but at the end of the job runs, it has to get validated whether the jobs and the data has run successfully or not.

Got a question for us? Please mention them in the comments section and we will get back to you. 

Related Posts:

Architecture of Data warehouse

Importance and benefits of Data warehousing

Get started with Data warehousing

About Author
Published on Sep 26,2014

Share on

Browse Categories