Wednesday, 26 July 2023

AWS

AMAZON ELASTIC COMPUTE CLOUD (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications.

AMAZON S3 or AMAZON SIMPLE STORAGE SERVICE is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.

LOGIN/CONNECTION ESTABLISHMENT

setx AWS_ACCESS_KEY_ID AKIAV7UQG7Q7C2OKGSM

setx AWS_SECRET_ACCESS_KEY I1DCBC3nwsRcV8+KSMSOLJMr1aZ8EzID

setx AWS_DEFAULT_REGION us-east-1

LIST FILES

aws s3 ls s3://abc/files/

COPY FILES

aws s3 cp F:\run.bat s3://abc/files/

DELETE FILES

aws s3 rm s3://abc/files/run.bat

Sample URL for file download

https://s3.amazonaws.com/abc/xyz.txt

TO KNOW ABOUT THE POSTGRESQL USERS

SELECT * FROM pg_catalog.pg_user ORDER BY usesysid DESC;

SELECT * FROM pg_authid ORDER BY 1

UNIX - CHANGE THE EDITOR

select-editor

DATE SERIES - POSTGRESQL

SELECT dates::date FROM generate_series(CURRENT_DATE, CURRENT_DATE-4, '-1 day'::interval)dates

FIND AND REPLACE TEXT WITHIN A FILE USING SED COMMAND

The is a test file created by nixCrft for demo purpose.

foo is good.

Foo is nice.

I love FOO.

sed 's/foo/bar/g' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

Foo is nice.

I love FOO.

To match all cases of foo (foo, FOO, Foo, FoO) add I (capitalized I) option as follows:

sed -i 's/foo/bar/gI' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

bar is nice.

I love bar.

SPLIT LARGE FILES INTO A NUMBER OF SMALLER FILES IN UNIX

To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:

split [options] filename prefix

Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:

-l linenumber

-b bytes

Assume myfile is 3,000 lines long:

split myfile

This will output three 1000-line files: xaa, xab, and xac.

Working on the same file, this next example is more complex:

split -l 500 myfile segment

This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, and segmentaf.

Finally, assume myfile is a 160KB file:

split -b 40k myfile segment

This will output four 40KB files: segmentaa, segmentab, segmentac, and segmentad.

DROP USER IN POSTGRSQL

REASSIGN OWNED BY aaa TO bbb;

DROP OWNED BY aaa;

DROP USER aaa;

CRON SCHEDULE - DEBUGGER FILE

45 10 * * * sh /home/c13/abc/etl/ExecuteUpdateCheckerJob.sh >/home/c13/abc/cronlogs/updatecheck.log 2>&1

00 11 * * * sh /home/c13/abc/etl/ExecuteDataLoadJob.sh >/home/c13/abc/cronlogs/dataloadjob.log 2>&1

30 13 * * 3 sh /home/c13/abc/etl/Catchup/ExecuteCatchUpJob.sh >/home/c13/abc/cronlogs/catchupjob.log 2>&1

N th TOP RECORD

select id, work

from (

select id, work, row_number() over (order by work) as rn

from x

) t

where rn = 1 -- or 2 or 3 ...

SQL - LEAD & LAG

select date_trunc('month', timestamp) as date,

count(*) as count,

100 * (count(*) - lag(count(*), 1) over (order by timestamp)) / lag(count(*), 1) over (order by timestamp)) || '%' as growth

from events

where event_name = 'created chart'

group by 1

order by 1

SQL - YEAR BY YEAR GROWTH OF SALES, MONTHLY AND AVERAGE SALES

YEAR BY YEAR GROWTH OF SALES

((Latest Year - Previous Year) / Previous Year) * 100

CALCULATE MONTHLY SALES REPORT IN MYSQL

select year(order_date),month(order_date),sum(sale)

from sales

group by year(order_date),month(order_date)

order by year(order_date),month(order_date);

+------------------+-------------------+-----------+

| year(order_date) | month(order_date) | sum(sale) |

+------------------+-------------------+-----------+

| 2020 | 1 | 408 |

| 2020 | 2 | 320 |

| 2020 | 3 | 540 |

| ... | ... | ... |

+------------------+-------------------+-----------+

select date_format(order_date,'%M'),sum(sale)

from sales

group by year(order_date),month(order_date)

order by year(order_date),month(order_date);

+------------------------------+-----------+

| date_format(order_date,'%M') | sum(sale) |

+------------------------------+-----------+

| January | 408 |

| Febuary | 320 |

| March | 540 |

| ... | ... |

+------------------------------+-----------+

CALCULATE AVERAGE SALES PER DAY IN MYSQL

select avg(sale) from sales;

+-----------+

| avg(sale) |

+-----------+

| 22.0000 |

+-----------+

select product, avg(sale) from sales group by product;

+---------+-----------+

| product | avg(sale) |

+---------+-----------+

| A | 23.3333 |

| B | 20.0000 |

+---------+-----------+

CALCULATE AVERAGE SALE PER DAY FOR EACH DAY OF THE WEEK

SELECT DAYNAME(order_date), AVG(sale)

-> FROM sales

-> GROUP BY DAYNAME(order_date);

+---------------------+-----------+

| DAYNAME(order_date) | AVG(sale) |

+---------------------+-----------+

| Friday | 15.0000 |

| Saturday | 30.0000 |

| Sunday | 20.0000 |

| Thursday | 25.0000 |

| Wednesday | 20.0000 |

+---------------------+-----------+

REMOVE EMPTY ROWS, NULL VALUES, MEAN, MEDIAN, MODE - PYTHON

REMOVE EMPTY ROWS

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())

By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

REPLACE NULL VALUES WITH THE NUMBER 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

REPLACE ONLY FOR SPECIFIED COLUMNS

import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

REPLACE USING MEAN, MEDIAN, OR MODE

MEAN = THE AVERAGE VALUE (THE SUM OF ALL VALUES DIVIDED BY NUMBER OF VALUES).

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

MEDIAN = THE VALUE IN THE MIDDLE, AFTER YOU HAVE SORTED ALL VALUES ASCENDING.

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

MODE = THE VALUE THAT APPEARS MOST FREQUENTLY

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)

Sunday, 26 March 2023

SQL JOIN

Friday, 17 February 2023

POSTGRSQL - DROP TRIGGERS

SELECT

'DROP TRIGGER abc.' || trigger_name || ' ON ' || event_object_table || ';'

FROM information_schema.triggers

WHERE trigger_schema = 'abc';

SQL - FIND DUPLICATE RECORDS

SELECT offset, COUNT(offset)

FROM log

GROUP BY offset

HAVING COUNT(offset) > 1;

CALCULATE TIME DIFFERENCE BETWEEN TWO CONTINUOUS ROWS

SELECT A.id, A.ts, TIMEDIFF (B.ts,A.ts) AS timedifference

FROM log A INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC

SELECT A.id, A.ts, CONVERT(TIMEDIFF (B.ts,A.ts),CHAR) AS timedifference,a.offset,a.input_offset

FROM log A INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC

FILE READ - REGULAR EXPRESSION

(j_dataload)_[0-9]{8}_[0-9]{14}\.log

SHELL SCRIPT TO EXECUTE PENTAHO JOB

#!/bin/bash -l

sh /home/abcserver/data-integration/kitchen.sh -file="/home/abcserver/etl/j_dataload.kjb" –level:Basic -logfile:/home/abcserver/etl/Logfiles/j_dataload_$(date +"%Y%m%d_%H%M%s").log

CALCULATE DIFFERENCE BETWEEN TWO TIMESTAMP IN POSTGRESQL

EXTRACT(DAY FROM MAX(joindate)-MIN(joindate)) AS DateDifference

EXTRACT(DAY FROM created-updated)

EXTRACT(HOUR FROM (updated - created)) AS HourDifference

CENTURY: It is the number of centuries

DAY: It is the day of the month (1-31) or the number of days

DECADE: It is the decade that is the year divided by 10.

DOW: It is the day of the week Sunday (0) to Saturday (6).

DOY: it is the day of the year that ranges from 1 to 366.

EPOCH: It is the total number of seconds in the interval

HOUR: It is the number of hours

MILLENNIUM: It is the number of millennium

MILLISECONDS: It is the second’s field, including fractional parts,

MINUTE: It is the minute (0-59) or a number of minutes.

MONTH: The number of months (1-12).

SECOND: It is the number of seconds.

WEEK: It is the number of weeks

YEAR: It is the year.

Monday, 23 January 2023

Python - Site Connectivity Checker

import urllib.request as urllib

​def main(url):    print("Checking connectivity ")    response = urllib.urlopen(url)    print("Connected to" , url, "succesfully")    print("The response code was: ", response.getcode())​print("This is a site connectivity checker program")input_url = input("Input the url of the site you want to check: ")​main(input_url)

This is a site connectivity checker program
Input the url of the site you want to check: https://www.w3schools.com/
Checking connectivity 
Connected to https://www.w3schools.com/ succesfully
The response code was:  200

Python - Email Slicer Project

def main():

    print(" Welcome to the email slicer ")

    email_input = input("Input your email address: ")

    (username, domain) = email_input.split("@")

    (domain, extension) = domain.split(".")

    print("Name : ", username)

    print("Domain : ", domain)

    print("Extension: ", extension)

main()

 Welcome to the email slicer 
Input your email address: john@google.co
Name :  john
Domain :  google
Extension:  co

Python - What is print(f"...")

The f means Formatted string literals and it's new in Python 3.6.

Python - Insert into Maria DB

#import mysql.connector
# Module Imports
import mariadb
import sys

try:
    conn = mariadb.connect(
        user="root",
        password="root",
        host="127.0.0.1",
        port=3306,
        database="my_workspace"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor

cur = conn.cursor()

#insert information
try:
    cur.execute("INSERT INTO employee (Emp_No,Emp_Name,Emp_Add,Emp_Phone,Dept_No,Dept_Name,Salary) VALUES (?, ?, ?, ?, ?, ?, ?)", (9,"Maria","Mumbai",8908908900,3445,"DB",22222))
except mariadb.Error as e:
    print(f"Error: {e}")

conn.commit()
conn.close()

SAMPLE TABLE CREATION SQL:

CREATE TABLE `employee` (
   `Emp_NO` INT(11) NOT NULL,
   `Emp_Name` TEXT NOT NULL,
   `Emp_Add` TEXT NOT NULL,
   `Emp_Phone` TEXT NOT NULL,
   `Dept_No` TEXT NOT NULL,
   `Dept_Name` TEXT NOT NULL,
   `Salary` TEXT NOT NULL,
   PRIMARY KEY (`Emp_NO`)
)

Python - PIP vs PIP3

Pip3 is the Python3 version of pip.

If you use pip, then only the python2.7 version will be installed. You have to use pip3 for it to be installed on Python3. So to install packages in python3, you should use pip3.

NOTE:- Its not necessary that pip will install in python 2.7, if python2 is absent then pip will do it in python3. The above statement was if you have both the version of python installed.

Python - Connecting to MariaDB Server

# Module Imports

import mariadb

import sys

try:

    conn = mariadb.connect(

        user="root",

        password="root",

        host="127.0.0.1",

        port=3306,

        database="my_workspace"

except mariadb.Error as e:

    print(f"Error connecting to MariaDB Platform: {e}")

    sys.exit(1)

# Get Cursor

cur = conn.cursor()

some_name = "Sales"

cur.execute("SELECT Emp_Name,Dept_Name FROM EMPLOYEE WHERE Dept_Name=?", (some_name,))

for first_name, last_name in cur:

    print(f"First name: {first_name}, Last name: {last_name}")

First name: Ramesh, Last name: Sales
First name: Suresh, Last name: Sales
First name: Rajesh, Last name: Sales
First name: Shyamu, Last name: Sales
First name: Ramu,   Last name: Sales
First name: Mahesh, Last name: Sales

Python - Install mariadb

Command to install mariadb,

pip3 install mariadb

if needed, try this.

pip3 install mariadb-connector-python

Python - PIP package remove

pip uninstall camelcase

Python - How to check is MySQL client is configured/ installed

To check you have MySQLdb installed on your machine.

#!/usr/bin/python

import MySQLdb

If the execution is success, then already installed.

If not, it will throw the below error,

Traceback (most recent call last):
   File "test.py", line 3, in <module>
      import MySQLdb
ImportError: No module named MySQLdb

Python - Install mysql

Command to install mysql client,

pip install mysqlclient

Note : MySQL Bundle-3.23 through 5.5 and Python-2.4 through 2.7 are currently supported.

To install MySQLdb module, use the following command

pip install MySQL-python

Python - Exceptions Handling

Here is simple syntax of try....except...else blocks −

try:
   You do your operations here;
   ......................
except ExceptionI:
   If there is ExceptionI, then execute this block.
except ExceptionII:
   If there is ExceptionII, then execute this block.
   ......................
else:
   If there is no exception then execute this block.

#!/usr/bin/python

try:

   fh = open("\abc\testfile", "w")

   fh.write("This is my test file for exception handling!!")

except IOError:

   print ("Error: can\'t find file or read data")

else:

   print ("Written content in the file successfully")

   fh.close()

Error: can't find file or read data

Python Directories

#!/usr/bin/python
import os

# Create a directory "test"
os.mkdir("test")

#!/usr/bin/python
import os

# This would give location of the current directory
os.getcwd()

#!/usr/bin/python
import os

# Changing a directory to "/home/newdir"
os.chdir("/home/newdir")

#!/usr/bin/python
import os

# This would remove "/tmp/test" directory.
os.rmdir( "test" )

Sunday, 22 January 2023

Python - File Operations

#FILE READ

#!/usr/bin/python

# Open a file

fo = open("sample.txt", "r")

content = fo.read()

print(content)

# Close opened file

fo.close()

print ("Name of the file: ", fo.name)

print ("Closed or not : ", fo.closed)

print ("Opening mode : ", fo.mode)

welcome1 Python is a great language. Yeah its great!! Name of the file: sample.txt Closed or not : True Opening mode : r

# FILE WRITE
#!/usr/bin/python
# Open a file

fo = open("sample.txt", "w")
fo.write( "welcome1 Python is a great language.\nYeah its great!!\n")

# Close opend file

fo.close()

Python Functions

def my_function_test(inparam):

  print(inparam + " Cars")

my_function_test("Maruthi")

my_function_test("Honda")

my_function_test("Kia")

Maruthi Cars
Honda Cars
Kia Cars

Python - Get Date Time

from datetime import datetime

getsysdatetime = datetime.now()

gettime = getsysdatetime.strftime("%H:%M:%S")

gettimeapproach2 = datetime.now().strftime("%H:%M:%S")

print("date and time : ",getsysdatetime)

print("time : ",gettime)

print("time approach 2: ",gettime)

date and time :  2023-01-23 10:47:46.738581
time :  10:47:46
time approach 2:  10:47:46

Python - IF ELIF ELSE & FOR

##IF ELIF ELSE

a = 400

b = 66

if b > a:

  print("b is greater than a")

elif a == b:

  print("a and b are equal")

else:

  print("a is greater than b")

a is greater than b

fruits = ["apple", "banana", "cherry"]for x in fruits:  print(x)

apple
banana
cherry

for x in "banana":  print(x)

b
a
n
a
n
a

fruits = ["apple", "banana", "cherry"]for x in fruits:  print(x)  if x == "banana":    break

apple
banana

Python - Collections (Arrays)

Collections (Arrays)

There are 4 built-in data types in Python used to store collections of data.

List, Tuple, Set, and Dictionary, all with different qualities and usage.

List is a collection which is ordered and changeable. Allows duplicate members.

Tuple is a collection which is ordered and unchangeable. Allows duplicate members.

Set is a collection which is unordered, unchangeable#, and unindexed. No duplicate members.

Dictionary is a collection which is ordered## and changeable. No duplicate members.

#Set items are unchangeable, but you can remove items and add new items.

##As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered.

List

Lists are used to store multiple items in a single variable.
Lists are created using square brackets.
List items are ordered, changeable, and allow duplicate values.
List items are indexed, the first item has index [0], the second item has index [1] etc.
When we say that lists are ordered, it means that the items have a defined order, and that order will not change.

If you add new items to a list, the new items will be placed at the end of the list.

The list is changeable, meaning that we can change, add, and remove items in a list after it has been created.
A list can contain different data types.
From Python's perspective, lists are defined as objects with the data type 'list'

Tuple

A tuple is a collection which is ordered and unchangeable.
When we say that tuples are ordered, it means that the items have a defined order, and that order will not change.
Tuples are unchangeable, meaning that we cannot change, add or remove items after the tuple has been created.

But there is a workaround. You can convert the tuple into a list, change the list, and convert the list back into a tuple.

Wednesday, 18 January 2023

CRONTAB

To edit the crontab

crontab -e

To view the crontab

crontab -l

After change, for save

ctrl+c, :wq!

After change, not to save (wrongly given, back to original)

ctrl+c, :q!

Postgresql - To find the time differences between two columns in a table

SELECT

A.id, A.ts, TIMEDIFF (B.ts,A.ts) AS timedifference

FROM log A

INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC

Penatho - Expressions for FILE READ

The file name is, 'j_dataload_20230117_18031673958817.log'

SCENARIO: If a directory contains more number of files and you want to read the 'j_dateload_' files alone, then you have to use the below expression.

EXPRESSION: (j_dataload)_[0-9]{8}_[0-9]{14}\.log

SELECTING WHERE TWO COLUMNS ARE IN A SET

-- works in PostgreSQL, Oracle, MySQL, DB2, HSQLDB

SELECT whatever

FROM t

WHERE

(col1, col2) IN ((val1a, val2a), (val1b, val2b), ...) ;

List views in postgresql

SELECT

table_name

FROM INFORMATION_SCHEMA.views

WHERE

table_schema = ANY (current_schemas(false))

ORDER BY 1

Kill job in Linux/Ubuntu

To list the running jobs,

ps -ef|grep java

To kill the job,

ill -9 pid

To kill the application by drag & drop

xkill

Sample shell script - To call the pentaho job

#!/bin/bash -l

sh /home/abc/data-integration/kitchen.sh -file="/home/abc/etl/j_dataload.kjb" –level:Basic -logfile:/home/abc/etl/Logfiles/j_dataload_$(date +"%Y%m%d_%H%M%s").log

sh /home/abc/data-integration/kitchen.sh -file="/home/abc/etl/j_logfile_archive.kjb" –level:Basic -logfile:/home/abc/etl/Logfiles/j_logfile_archive_$(date +"%Y%m%d_%H%M%s").log