ETL Concepts and Business Logic
It is a process of extracting the data and transforming into required business format via loading into DWH.
ETL is often a complex combination of process and technology that consumes a significant portion of the data warehouse development efforts and requires the skills of business analysts, database designers, and application developers. It is not a one time event as new data is added to the Data Warehouse periodically – monthly, daily, hourly. Because ETL is an integral, on-going, and recurring part of a data warehouse
o Well documented
o Easily changeable
There are 2 types of ETL’s used in implementing data acquisition.
Code based ETL:
An ETL application can be developed using some programming languages such as SQL PL/SQL.
o Oracle SQL Loader
o SAS Based
o SAS Access
o Tera data Load
GUI based ETL :
An ETL application can be designed using simple graphically user interface point and click techniques.
o Data stage
o AB Initio
o Data Manager
o Data services
It is a process of reading the data from multiple operational source system.
The following are the types of operation source systems.
o Sql server
o People soft
o Cobol files
o Flat Files
o XML files
o Excel sheets
It is a process of converting the data and clinzing the data in the staging area. Staging area is a temporary memory or buffer where the data transformation activities take place. The following one the various types of data transformation activities take place
o Data Merging
o Data clinzing
o Data scrubbing
o Data aggregation
It is process of integrating the data from multiple operational source system.
There are 2 types of data merge operations.
o Horizontal Merging(Join)
o Vertical Merging(Union)
It’s a process of changing inconsistencies and inaccuracies; or a process of removing unwanted data (filtering)
It is a process of deriving new data definitions.
It’s a process of calculating the summaries using an aggregate function called Sum ()
It is a process of inserting data into a target system. there is 2 types of data loads.
Initial Load or full Load:
It’s a process of inserting the data into an empty the data into an empty target tables very first time. All the required data loads into empty target tables.
Incremental data load or delta load:
It’s a process of loading only new record or any changes records which takes place after initial load
An ETL Use-Case
In order to demonstrate this, we’ll use a use-case that is similar in concept. In our ETL development example, the human resources department uses a recruiting tool written using a dBase- or FoxPro-like data structure. But the employee data is stored in Microsoft SQL Server.
For reference, here is the new_empl.dbf and using an XSLT transform that reads a table and displays it as HTML.
Our steps will be then, to Extract, Transform, and Load
• The extract step will take the data from the dBASE III file and convert it into a more usable format – XML
• The transform step will change the date format into standard ISO dates, split the name into first and last names, and assign the appropriate manager based on whether the employee is being assigned to inside sales or the external sales force.
• The load step will take the resulting file and send it to SQL Server.
• As a side-effect, it would be nice to get a report of data loaded.