Tuesday, 15 April 2014

ETL Terms

Source System:

Source System is a database, application, file, or other storage facility from which the data in a data warehouse is derived. Some of them are Flat files,Oracle Tables, Microsoft SQL server tables, COBOL Sources, XML files.

Mapping: 

Mapping is the definition of the relationship and data flow between source and target objects. It is a pictorial representation about the flow of data from source to target.

Metadata:

Metadata describes data and other structures, such as objects, business rules, and processes.
Metadata contains all the information about the source tables, target tables, the transformations, so that it will be useful and easy to perform transformations during the ETL process. A repository contains metadata.

Staging Area:

Staging area is place where you hold temporary tables on data warehouse server. Staging tables are connected to work area or fact tables. We basically need staging area to hold the data, and perform data cleansing and merging, before loading the data into warehouse.

Data Cleansing:

It is the process of resolving inconsistencies and fixing the anomalies in source data, typically as part of the ETL process. The data cleansing technology improves data quality by validating, correctly naming and standardizing data. For example, a person's address may not be same in all source systems because of typos and postal code, city name may not match with address. These errors can be corrected by using data cleansing process and standardized data can be loaded in target systems (data warehouse).

Transformation:

Transformation is the process of manipulating data. Any manipulation beyond copying is a transformation. Examples include cleansing, aggregating, and integrating data from multiple sources. A transformation is a repository object that generates, modifies, or passes data.

Target System:

Target System is a database, application, file, or other storage facility to which the "transformed source data" is loaded in a data warehouse.

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...