Sunday, 28 April 2024

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and saved as a .sql file. While the definition is straightforward, the complexity of the execution will vary from environment to environment. Models will be written and rewritten as needs evolve and your organization finds new ways to maximize efficiency.

SQL is the language most dbt users will utilize, but it is not the only one for building models. Starting in version 1.3, dbt Core and dbt Cloud support Python models. Python models are useful for training or deploying data science models, complex transformations, or where a specific Python package meets a need — such as using the dateutil library to parse dates.

DBT - Models and modern workflows

The top level of a dbt workflow is the project. A project is a directory of a .yml file (the project configuration). The project file tells dbt the project context, and the models let dbt know how to build a specific data set.

DBT - Jinja

SQL files can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, if statements and for loops. It also enables repeated SQL to be shared through macros.SQL files can contain Jinja, a lightweight templating language. Using Jinja in SQL provides a way to use control structures in your queries. For example, if statements and for loops. It also enables repeated SQL to be shared through macros.

DBT Core vs. DBT Cloud

dbt is offered through two interfaces: dbt Core and dbt Cloud.

dbt Core is an open-source library that implements most of the functionality of dbt. It has a command-line interface (the dbt command you will come to love) that you can use to manage data transformations in your projects.

dbt Cloud is an enterprise solution for teams. On top of the CLI, dbt Cloud also provides a more user-friendly web-based IDE. With it, you don’t have to worry about database connections and editing YAML files so much (as you will see in the coming sections).

dbt Cloud also offers additional features like job scheduling, advanced integrations and high priority support.

What is DBT

Data Build Tool or dbt is built to transform data, and is therefore, the T in an ELT pipeline. I mentioned ELT because it is designed to work after data has been loaded, and is ready for transformation. Additionally, out of the box, it cannot connect with multiple databases, and depends on data that has been loaded, or otherwise accessible to the target database executing the dbt steps.

Wednesday, 26 July 2023

AWS

AMAZON ELASTIC COMPUTE CLOUD (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications.

AMAZON S3 or AMAZON SIMPLE STORAGE SERVICE is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.


LOGIN/CONNECTION ESTABLISHMENT


setx AWS_ACCESS_KEY_ID AKIAV7UQG7Q7C2OKGSM

setx AWS_SECRET_ACCESS_KEY I1DCBC3nwsRcV8+KSMSOLJMr1aZ8EzID

setx AWS_DEFAULT_REGION us-east-1


LIST FILES

aws s3 ls s3://abc/files/


COPY FILES

aws s3 cp F:\run.bat s3://abc/files/


DELETE FILES

aws s3 rm s3://abc/files/run.bat


Sample URL for file download

https://s3.amazonaws.com/abc/xyz.txt

TO KNOW ABOUT THE POSTGRESQL USERS

SELECT * FROM pg_catalog.pg_user ORDER BY usesysid DESC;

SELECT * FROM pg_authid ORDER BY 1

UNIX - CHANGE THE EDITOR

select-editor

DATE SERIES - POSTGRESQL

 SELECT dates::date FROM generate_series(CURRENT_DATE,   CURRENT_DATE-4, '-1 day'::interval)dates


FIND AND REPLACE TEXT WITHIN A FILE USING SED COMMAND

The is a test file created by nixCrft for demo purpose.

foo is good.

Foo is nice.

I love FOO.


sed 's/foo/bar/g' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

Foo is nice.

I love FOO.

To match all cases of foo (foo, FOO, Foo, FoO) add I (capitalized I) option as follows:

sed -i 's/foo/bar/gI' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

bar is nice.

I love bar.

SPLIT LARGE FILES INTO A NUMBER OF SMALLER FILES IN UNIX

 To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:


  split [options] filename prefix

  

Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:


  -l linenumber


  -b bytes


Assume myfile is 3,000 lines long:


  split myfile

  

This will output three 1000-line files: xaa, xab, and xac.


Working on the same file, this next example is more complex:


  split -l 500 myfile segment


This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, and segmentaf.


Finally, assume myfile is a 160KB file:


  split -b 40k myfile segment


This will output four 40KB files: segmentaa, segmentab, segmentac, and segmentad.

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...