Subramanian G: July 2023

Wednesday, 26 July 2023

AWS

AMAZON ELASTIC COMPUTE CLOUD (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications.

AMAZON S3 or AMAZON SIMPLE STORAGE SERVICE is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.

LOGIN/CONNECTION ESTABLISHMENT

setx AWS_ACCESS_KEY_ID AKIAV7UQG7Q7C2OKGSM

setx AWS_SECRET_ACCESS_KEY I1DCBC3nwsRcV8+KSMSOLJMr1aZ8EzID

setx AWS_DEFAULT_REGION us-east-1

LIST FILES

aws s3 ls s3://abc/files/

COPY FILES

aws s3 cp F:\run.bat s3://abc/files/

DELETE FILES

aws s3 rm s3://abc/files/run.bat

Sample URL for file download

https://s3.amazonaws.com/abc/xyz.txt

TO KNOW ABOUT THE POSTGRESQL USERS

SELECT * FROM pg_catalog.pg_user ORDER BY usesysid DESC;

SELECT * FROM pg_authid ORDER BY 1

UNIX - CHANGE THE EDITOR

select-editor

DATE SERIES - POSTGRESQL

SELECT dates::date FROM generate_series(CURRENT_DATE, CURRENT_DATE-4, '-1 day'::interval)dates

FIND AND REPLACE TEXT WITHIN A FILE USING SED COMMAND

The is a test file created by nixCrft for demo purpose.

foo is good.

Foo is nice.

I love FOO.

sed 's/foo/bar/g' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

Foo is nice.

I love FOO.

To match all cases of foo (foo, FOO, Foo, FoO) add I (capitalized I) option as follows:

sed -i 's/foo/bar/gI' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

bar is nice.

I love bar.

SPLIT LARGE FILES INTO A NUMBER OF SMALLER FILES IN UNIX

To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:

split [options] filename prefix

Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:

-l linenumber

-b bytes

Assume myfile is 3,000 lines long:

split myfile

This will output three 1000-line files: xaa, xab, and xac.

Working on the same file, this next example is more complex:

split -l 500 myfile segment

This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, and segmentaf.

Finally, assume myfile is a 160KB file:

split -b 40k myfile segment

This will output four 40KB files: segmentaa, segmentab, segmentac, and segmentad.

DROP USER IN POSTGRSQL

REASSIGN OWNED BY aaa TO bbb;

DROP OWNED BY aaa;

DROP USER aaa;

CRON SCHEDULE - DEBUGGER FILE

45 10 * * * sh /home/c13/abc/etl/ExecuteUpdateCheckerJob.sh >/home/c13/abc/cronlogs/updatecheck.log 2>&1

00 11 * * * sh /home/c13/abc/etl/ExecuteDataLoadJob.sh >/home/c13/abc/cronlogs/dataloadjob.log 2>&1

30 13 * * 3 sh /home/c13/abc/etl/Catchup/ExecuteCatchUpJob.sh >/home/c13/abc/cronlogs/catchupjob.log 2>&1

N th TOP RECORD

select id, work

from (

select id, work, row_number() over (order by work) as rn

from x

) t

where rn = 1 -- or 2 or 3 ...

SQL - LEAD & LAG

select date_trunc('month', timestamp) as date,

count(*) as count,

100 * (count(*) - lag(count(*), 1) over (order by timestamp)) / lag(count(*), 1) over (order by timestamp)) || '%' as growth

from events

where event_name = 'created chart'

group by 1

order by 1

SQL - YEAR BY YEAR GROWTH OF SALES, MONTHLY AND AVERAGE SALES

YEAR BY YEAR GROWTH OF SALES

((Latest Year - Previous Year) / Previous Year) * 100

CALCULATE MONTHLY SALES REPORT IN MYSQL

select year(order_date),month(order_date),sum(sale)

from sales

group by year(order_date),month(order_date)

order by year(order_date),month(order_date);

+------------------+-------------------+-----------+

| year(order_date) | month(order_date) | sum(sale) |

+------------------+-------------------+-----------+

| 2020 | 1 | 408 |

| 2020 | 2 | 320 |

| 2020 | 3 | 540 |

| ... | ... | ... |

+------------------+-------------------+-----------+

select date_format(order_date,'%M'),sum(sale)

from sales

group by year(order_date),month(order_date)

order by year(order_date),month(order_date);

+------------------------------+-----------+

| date_format(order_date,'%M') | sum(sale) |

+------------------------------+-----------+

| January | 408 |

| Febuary | 320 |

| March | 540 |

| ... | ... |

+------------------------------+-----------+

CALCULATE AVERAGE SALES PER DAY IN MYSQL

select avg(sale) from sales;

+-----------+

| avg(sale) |

+-----------+

| 22.0000 |

+-----------+

select product, avg(sale) from sales group by product;

+---------+-----------+

| product | avg(sale) |

+---------+-----------+

| A | 23.3333 |

| B | 20.0000 |

+---------+-----------+

CALCULATE AVERAGE SALE PER DAY FOR EACH DAY OF THE WEEK

SELECT DAYNAME(order_date), AVG(sale)

-> FROM sales

-> GROUP BY DAYNAME(order_date);

+---------------------+-----------+

| DAYNAME(order_date) | AVG(sale) |

+---------------------+-----------+

| Friday | 15.0000 |

| Saturday | 30.0000 |

| Sunday | 20.0000 |

| Thursday | 25.0000 |

| Wednesday | 20.0000 |

+---------------------+-----------+

REMOVE EMPTY ROWS, NULL VALUES, MEAN, MEDIAN, MODE - PYTHON

REMOVE EMPTY ROWS

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())

By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

REPLACE NULL VALUES WITH THE NUMBER 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

REPLACE ONLY FOR SPECIFIED COLUMNS

import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

REPLACE USING MEAN, MEDIAN, OR MODE

MEAN = THE AVERAGE VALUE (THE SUM OF ALL VALUES DIVIDED BY NUMBER OF VALUES).

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

MEDIAN = THE VALUE IN THE MIDDLE, AFTER YOU HAVE SORTED ALL VALUES ASCENDING.

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

MODE = THE VALUE THAT APPEARS MOST FREQUENTLY

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)