The extraction methods
in data warehouse depend on the source system, performance and business
requirements. There are two types of extractions, Logical and Physical. We will
see in detail about the logical and physical designs.
Logical extraction
There are two types of logical extraction methods:
Full Extraction: Full extraction is used when the data needs to be extracted and loaded for the first time. In full extraction, the data from the source is extracted completely. This extraction reflects the current data available in the source system.
Incremental Extraction: In incremental extraction, the changes in source data need to be tracked since the last successful extraction. Only these changes in data will be extracted and then loaded. These changes can be detected from the source data which have the last changed timestamp. Also a change table can be created in the source system, which keeps track of the changes in the source data.
One more method to get the incremental changes is to extract the complete source data and then do a difference (minus operation) between the current extraction and last extraction. This approach causes a performance issue.
Physical extraction
The data can be extracted physically by two methods:
Online Extraction: In online extraction the data is extracted directly from the source system. The extraction process connects to the source system and extracts the source data.
Offline Extraction: The data from the source system is dumped outside of the source system into a flat file. This flat file is used to extract the data. The flat file can be created by a routine process daily.
Logical extraction
There are two types of logical extraction methods:
Full Extraction: Full extraction is used when the data needs to be extracted and loaded for the first time. In full extraction, the data from the source is extracted completely. This extraction reflects the current data available in the source system.
Incremental Extraction: In incremental extraction, the changes in source data need to be tracked since the last successful extraction. Only these changes in data will be extracted and then loaded. These changes can be detected from the source data which have the last changed timestamp. Also a change table can be created in the source system, which keeps track of the changes in the source data.
One more method to get the incremental changes is to extract the complete source data and then do a difference (minus operation) between the current extraction and last extraction. This approach causes a performance issue.
Physical extraction
The data can be extracted physically by two methods:
Online Extraction: In online extraction the data is extracted directly from the source system. The extraction process connects to the source system and extracts the source data.
Offline Extraction: The data from the source system is dumped outside of the source system into a flat file. This flat file is used to extract the data. The flat file can be created by a routine process daily.