Sunday, 8 June 2014

Data Warehousing Fundamentals

Data warehouse is an information delivery system.

In this system, you integrate and transform enterprise data into information suitable for strategic decision making. You take all the historic data from the various operational systems,combine this internal data with any relevant data from outside sources, and pull them together. You resolve any conflicts in the  way data resides in different systems and transform the integrated data content into a format suitable for providing information to the various classes of users. Finally, you implement the information delivery methods.

Bill Inmon, considered to be the father of Data Warehousing provides the following definition:

“A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions.”

Sean Kelly, another leading data warehousing practitioner defines the data warehouse in the following way. The data in the data warehouse is:
  • Separate
  • Available
  • Integrated
  • Time stamped
  • Subject oriented
  • Non-volatile
  • Accessible

Integrated Data

Here are some of the items that would need standardization:

  • Naming conventions
  • Codes
  • Data attributes
  • Measurements



Time-Variant Data

  • Allows for analysis of the past
  • Relates information to the present
  • Enables forecasts for the future

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...