Sunday, 28 April 2024

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and saved as a .sql file. While the definition is straightforward, the complexity of the execution will vary from environment to environment. Models will be written and rewritten as needs evolve and your organization finds new ways to maximize efficiency.

SQL is the language most dbt users will utilize, but it is not the only one for building models. Starting in version 1.3, dbt Core and dbt Cloud support Python models. Python models are useful for training or deploying data science models, complex transformations, or where a specific Python package meets a need — such as using the dateutil library to parse dates.

DBT - Models and modern workflows

The top level of a dbt workflow is the project. A project is a directory of a .yml file (the project configuration). The project file tells dbt the project context, and the models let dbt know how to build a specific data set.

DBT - Jinja

SQL files can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, if statements and for loops. It also enables repeated SQL to be shared through macros.SQL files can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, if statements and for loops. It also enables repeated SQL to be shared through macros.

DBT Core vs. DBT Cloud

dbt is offered through two interfaces: dbt Core and dbt Cloud.

dbt Core is an open-source library that implements most of the functionality of dbt. It has a command-line interface (the dbt command you will come to love) that you can use to manage data transformations in your projects.

dbt Cloud is an enterprise solution for teams. On top of the CLI, dbt Cloud also provides a more user-friendly web-based IDE. With it, you don’t have to worry about database connections and editing YAML files so much (as you will see in the coming sections).

dbt Cloud also offers additional features like job scheduling, advanced integrations and high priority support.

What is DBT

Data Build Tool or dbt is built to transform data, and is therefore, the T in an ELT pipeline. I mentioned ELT because it is designed to work after data has been loaded, and is ready for transformation. Additionally, out of the box, it cannot connect with multiple databases, and depends on data that has been loaded, or otherwise accessible to the target database executing the dbt steps.

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...