Wednesday, 26 July 2023

AWS

AMAZON ELASTIC COMPUTE CLOUD (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications.

AMAZON S3 or AMAZON SIMPLE STORAGE SERVICE is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface.


LOGIN/CONNECTION ESTABLISHMENT


setx AWS_ACCESS_KEY_ID AKIAV7UQG7Q7C2OKGSM

setx AWS_SECRET_ACCESS_KEY I1DCBC3nwsRcV8+KSMSOLJMr1aZ8EzID

setx AWS_DEFAULT_REGION us-east-1


LIST FILES

aws s3 ls s3://abc/files/


COPY FILES

aws s3 cp F:\run.bat s3://abc/files/


DELETE FILES

aws s3 rm s3://abc/files/run.bat


Sample URL for file download

https://s3.amazonaws.com/abc/xyz.txt

TO KNOW ABOUT THE POSTGRESQL USERS

SELECT * FROM pg_catalog.pg_user ORDER BY usesysid DESC;

SELECT * FROM pg_authid ORDER BY 1

UNIX - CHANGE THE EDITOR

select-editor

DATE SERIES - POSTGRESQL

 SELECT dates::date FROM generate_series(CURRENT_DATE,   CURRENT_DATE-4, '-1 day'::interval)dates


FIND AND REPLACE TEXT WITHIN A FILE USING SED COMMAND

The is a test file created by nixCrft for demo purpose.

foo is good.

Foo is nice.

I love FOO.


sed 's/foo/bar/g' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

Foo is nice.

I love FOO.

To match all cases of foo (foo, FOO, Foo, FoO) add I (capitalized I) option as follows:

sed -i 's/foo/bar/gI' hello.txt

OUTPUT:

The is a test file created by nixCrft for demo purpose.

bar is good.

bar is nice.

I love bar.

SPLIT LARGE FILES INTO A NUMBER OF SMALLER FILES IN UNIX

 To split large files into smaller files in Unix, use the split command. At the Unix prompt, enter:


  split [options] filename prefix

  

Replace filename with the name of the large file you wish to split. Replace prefix with the name you wish to give the small output files. You can exclude [options], or replace it with either of the following:


  -l linenumber


  -b bytes


Assume myfile is 3,000 lines long:


  split myfile

  

This will output three 1000-line files: xaa, xab, and xac.


Working on the same file, this next example is more complex:


  split -l 500 myfile segment


This will output six 500-line files: segmentaa, segmentab, segmentac, segmentad, segmentae, and segmentaf.


Finally, assume myfile is a 160KB file:


  split -b 40k myfile segment


This will output four 40KB files: segmentaa, segmentab, segmentac, and segmentad.

DROP USER IN POSTGRSQL

REASSIGN OWNED BY aaa TO bbb;  

DROP OWNED BY aaa;

DROP USER aaa;

CRON SCHEDULE - DEBUGGER FILE

45 10 * * * sh /home/c13/abc/etl/ExecuteUpdateCheckerJob.sh >/home/c13/abc/cronlogs/updatecheck.log 2>&1

00 11 * * * sh /home/c13/abc/etl/ExecuteDataLoadJob.sh >/home/c13/abc/cronlogs/dataloadjob.log 2>&1

30 13 * * 3 sh /home/c13/abc/etl/Catchup/ExecuteCatchUpJob.sh >/home/c13/abc/cronlogs/catchupjob.log 2>&1

N th TOP RECORD

select id, work

from (

  select id, work, row_number() over (order by work) as rn

  from x

) t

where rn = 1 -- or 2 or 3 ...

SQL - LEAD & LAG

 select date_trunc('month', timestamp) as date,

       count(*) as count,

       100 * (count(*) - lag(count(*), 1) over (order by timestamp)) / lag(count(*), 1) over (order by timestamp)) || '%' as growth

from events

where event_name = 'created chart'

group by 1

order by 1

SQL - YEAR BY YEAR GROWTH OF SALES, MONTHLY AND AVERAGE SALES

YEAR BY YEAR GROWTH OF SALES


((Latest Year - Previous Year) / Previous Year) * 100


CALCULATE MONTHLY SALES REPORT IN MYSQL


select year(order_date),month(order_date),sum(sale)

     from sales

     group by year(order_date),month(order_date)

     order by year(order_date),month(order_date);


+------------------+-------------------+-----------+

| year(order_date) | month(order_date) | sum(sale) |

+------------------+-------------------+-----------+

|             2020 |                 1 |       408 |

|             2020 |                 2 |       320 |

|             2020 |                 3 |       540 |

|              ... |               ... |       ... |

+------------------+-------------------+-----------+

 

  select date_format(order_date,'%M'),sum(sale)

      from sales

      group by year(order_date),month(order_date)

      order by year(order_date),month(order_date);  

  

+------------------------------+-----------+

| date_format(order_date,'%M') | sum(sale) |

+------------------------------+-----------+

| January                      |       408 |

| Febuary                      |       320 |

| March                        |       540 |

| ...                          |       ... |

+------------------------------+-----------+


CALCULATE AVERAGE SALES PER DAY IN MYSQL


select avg(sale) from sales;

+-----------+

| avg(sale) |

+-----------+

|   22.0000 |

+-----------+


select product, avg(sale) from sales group by product;

+---------+-----------+

| product | avg(sale) |

+---------+-----------+

| A       |   23.3333 |

| B       |   20.0000 |

+---------+-----------+


CALCULATE AVERAGE SALE PER DAY FOR EACH DAY OF THE WEEK


SELECT   DAYNAME(order_date), AVG(sale)

    -> FROM     sales

    -> GROUP BY DAYNAME(order_date);

+---------------------+-----------+

| DAYNAME(order_date) | AVG(sale) |

+---------------------+-----------+

| Friday              |   15.0000 |

| Saturday            |   30.0000 |

| Sunday              |   20.0000 |

| Thursday            |   25.0000 |

| Wednesday           |   20.0000 |

+---------------------+-----------+

REMOVE EMPTY ROWS, NULL VALUES, MEAN, MEDIAN, MODE - PYTHON

REMOVE EMPTY ROWS

import pandas as pd

df = pd.read_csv('data.csv')

new_df = df.dropna()

print(new_df.to_string())


By default, the dropna() method returns a new DataFrame, and will not change the original.

If you want to change the original DataFrame, use the inplace = True argument

import pandas as pd

df = pd.read_csv('data.csv')

df.dropna(inplace = True)

print(df.to_string())

REPLACE NULL VALUES WITH THE NUMBER 130

import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

REPLACE ONLY FOR SPECIFIED COLUMNS

import pandas as pd

df = pd.read_csv('data.csv')

df["Calories"].fillna(130, inplace = True)

REPLACE USING MEAN, MEDIAN, OR MODE


MEAN = THE AVERAGE VALUE (THE SUM OF ALL VALUES DIVIDED BY NUMBER OF VALUES).


import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mean()

df["Calories"].fillna(x, inplace = True)

MEDIAN = THE VALUE IN THE MIDDLE, AFTER YOU HAVE SORTED ALL VALUES ASCENDING.

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].median()

df["Calories"].fillna(x, inplace = True)

MODE = THE VALUE THAT APPEARS MOST FREQUENTLY

import pandas as pd

df = pd.read_csv('data.csv')

x = df["Calories"].mode()[0]

df["Calories"].fillna(x, inplace = True)

Sunday, 26 March 2023

Friday, 17 February 2023

POSTGRSQL - DROP TRIGGERS

SELECT 

'DROP TRIGGER abc.' || trigger_name || ' ON ' || event_object_table || ';'

FROM information_schema.triggers

WHERE trigger_schema = 'abc';

SQL - FIND DUPLICATE RECORDS

SELECT offset, COUNT(offset)  

FROM log 

GROUP BY offset  

HAVING COUNT(offset) > 1;  

CALCULATE TIME DIFFERENCE BETWEEN TWO CONTINUOUS ROWS

SELECT A.id, A.ts, TIMEDIFF (B.ts,A.ts) AS timedifference

FROM log A INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC



SELECT A.id, A.ts, CONVERT(TIMEDIFF (B.ts,A.ts),CHAR) AS timedifference,a.offset,a.input_offset

FROM log A INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC 

FILE READ - REGULAR EXPRESSION

 (j_dataload)_[0-9]{8}_[0-9]{14}\.log

SHELL SCRIPT TO EXECUTE PENTAHO JOB

 #!/bin/bash -l

sh /home/abcserver/data-integration/kitchen.sh -file="/home/abcserver/etl/j_dataload.kjb" –level:Basic -logfile:/home/abcserver/etl/Logfiles/j_dataload_$(date +"%Y%m%d_%H%M%s").log

CALCULATE DIFFERENCE BETWEEN TWO TIMESTAMP IN POSTGRESQL

EXTRACT(DAY FROM MAX(joindate)-MIN(joindate)) AS DateDifference

EXTRACT(DAY FROM created-updated)

EXTRACT(HOUR FROM (updated - created)) AS HourDifference



CENTURY: It is the number of centuries

DAY: It is the day of the month (1-31) or the number of days

DECADE: It is the decade that is the year divided by 10.

DOW: It is the day of the week Sunday (0) to Saturday (6).

DOY: it is the day of the year that ranges from 1 to 366.

EPOCH: It is the total number of seconds in the interval

HOUR: It is the number of hours

MILLENNIUM: It is the number of millennium

MILLISECONDS: It is the second’s field, including fractional parts,

MINUTE: It is the minute (0-59) or a number of minutes.

MONTH: The number of months (1-12).

SECOND: It is the number of seconds.

WEEK: It is the number of weeks

YEAR: It is the year.

Monday, 23 January 2023

Python - Site Connectivity Checker

import urllib.request as urllib

This is a site connectivity checker program Input the url of the site you want to check: https://www.w3schools.com/ Checking connectivity Connected to https://www.w3schools.com/ succesfully The response code was: 200

Python - Email Slicer Project

 def main():

Welcome to the email slicer Input your email address: john@google.co Name : john Domain : google Extension: co

Python - What is print(f"...")

 The f means Formatted string literals and it's new in Python 3.6.

Python - Insert into Maria DB

#import mysql.connector
# Module Imports
import mariadb
import sys

try:
    conn = mariadb.connect(
        user="root",
        password="root",
        host="127.0.0.1",
        port=3306,
        database="my_workspace"

    )
except mariadb.Error as e:
    print(f"Error connecting to MariaDB Platform: {e}")
    sys.exit(1)

# Get Cursor


cur = conn.cursor()
 

#insert information
try:
    cur.execute("INSERT INTO employee (Emp_No,Emp_Name,Emp_Add,Emp_Phone,Dept_No,Dept_Name,Salary) VALUES (?, ?, ?, ?, ?, ?, ?)", (9,"Maria","Mumbai",8908908900,3445,"DB",22222))
except mariadb.Error as e:
    print(f"Error: {e}")

conn.commit()     
conn.close()

 

 

 

SAMPLE TABLE CREATION SQL:

 

CREATE TABLE `employee` (
    `Emp_NO` INT(11) NOT NULL,
    `Emp_Name` TEXT NOT NULL,
    `Emp_Add` TEXT NOT NULL,
    `Emp_Phone` TEXT NOT NULL,
    `Dept_No` TEXT NOT NULL,
    `Dept_Name` TEXT NOT NULL,
    `Salary` TEXT NOT NULL,
    PRIMARY KEY (`Emp_NO`)
)

Python - PIP vs PIP3

Pip3 is the Python3 version of pip.

If you use pip, then only the python2.7 version will be installed. You have to use pip3 for it to be installed on Python3. So to install packages in python3, you should use pip3.

NOTE:- Its not necessary that pip will install in python 2.7, if python2 is absent then pip will do it in python3. The above statement was if you have both the version of python installed.

Python - Connecting to MariaDB Server

First name: Ramesh, Last name: Sales First name: Suresh, Last name: Sales First name: Rajesh, Last name: Sales First name: Shyamu, Last name: Sales First name: Ramu, Last name: Sales First name: Mahesh, Last name: Sales

Python - Install mariadb

Command to install mariadb,

pip3 install mariadb

 

if needed, try this.

pip3 install mariadb-connector-python

 

Python - PIP package remove

pip uninstall camelcase

Python - How to check is MySQL client is configured/ installed

To check you have MySQLdb installed on your machine. 

#!/usr/bin/python

import MySQLdb

If the execution is success, then already installed.

If not, it will throw the below error,

Traceback (most recent call last):
   File "test.py", line 3, in <module>
      import MySQLdb
ImportError: No module named MySQLdb

 

Python - Install mysql

 Command to install mysql client,

pip install mysqlclient

 

Note : MySQL Bundle-3.23 through 5.5 and Python-2.4 through 2.7 are currently supported.

To install MySQLdb module, use the following command

pip install MySQL-python

 

Python - Exceptions Handling

Here is simple syntax of try....except...else blocks −

try:
   You do your operations here;
   ......................
except ExceptionI:
   If there is ExceptionI, then execute this block.
except ExceptionII:
   If there is ExceptionII, then execute this block.
   ......................
else:
   If there is no exception then execute this block. 
 
 
Error: can't find file or read data
 

Python Directories

 #!/usr/bin/python
import os

# Create a directory "test"
os.mkdir("test")

 

#!/usr/bin/python
import os

# This would give location of the current directory
os.getcwd() 


#!/usr/bin/python
import os

# Changing a directory to "/home/newdir"
os.chdir("/home/newdir")


#!/usr/bin/python
import os

# This would  remove "/tmp/test"  directory.
os.rmdir( "test"  )

Sunday, 22 January 2023

Python - File Operations

#FILE READ

 
welcome1 Python is a great language. Yeah its great!! Name of the file: sample.txt Closed or not : True Opening mode : r

 

# FILE WRITE
#!/usr/bin/python
# Open a file


fo = open("sample.txt", "w")
fo.write( "welcome1 Python is a great language.\nYeah its great!!\n")


# Close opend file

fo.close() 

Python Functions

 def my_function_test(inparam):

Maruthi Cars Honda Cars Kia Cars

Python - Get Date Time

from datetime import datetime

date and time : 2023-01-23 10:47:46.738581 time : 10:47:46 time approach 2: 10:47:46

Python - IF ELIF ELSE & FOR

 ##IF ELIF ELSE

 
a is greater than b

 

apple banana cherry
b a n a n a
apple banana

Python - Collections (Arrays)

Collections (Arrays)

There are 4 built-in data types in Python used to store collections of data.

List, Tuple, Set, and Dictionary, all with different qualities and usage.

  • List is a collection which is ordered and changeable. Allows duplicate members.
  • Tuple is a collection which is ordered and unchangeable. Allows duplicate members.
  • Set is a collection which is unordered, unchangeable#, and unindexed. No duplicate members.
  • Dictionary is a collection which is ordered## and changeable. No duplicate members.


#Set items are unchangeable, but you can remove items and add new items.

 

##As of Python version 3.7, dictionaries are ordered. In Python 3.6 and earlier, dictionaries are unordered.


List

  • Lists are used to store multiple items in a single variable. 
  • Lists are created using square brackets. 
  • List items are ordered, changeable, and allow duplicate values. 
  • List items are indexed, the first item has index [0], the second item has index [1] etc. 
  • When we say that lists are ordered, it means that the items have a defined order, and that order will not change. 
    • If you add new items to a list, the new items will be placed at the end of the list. 
  • The list is changeable, meaning that we can change, add, and remove items in a list after it has been created. 
  • A list can contain different data types.
  • From Python's perspective, lists are defined as objects with the data type 'list'    


Tuple

  • A tuple is a collection which is ordered and unchangeable. 
  • When we say that tuples are ordered, it means that the items have a defined order, and that order will not change. 
  • Tuples are unchangeable, meaning that we cannot change, add or remove items after the tuple has been created. 
    • But there is a workaround. You can convert the tuple into a list, change the list, and convert the list back into a tuple.

Wednesday, 18 January 2023

CRONTAB

 To edit the crontab

    crontab -e


To view the crontab

    crontab -l


After change, for save

    ctrl+c, :wq!


After change, not to save (wrongly given, back to original)

    ctrl+c, :q!

Postgresql - To find the time differences between two columns in a table

SELECT 

A.id, A.ts, TIMEDIFF (B.ts,A.ts) AS timedifference

FROM log A 

INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC

Penatho - Expressions for FILE READ

The file name is, 'j_dataload_20230117_18031673958817.log'


SCENARIO: If a directory contains more number of files and you want to read the 'j_dateload_' files alone, then you have to use the below expression.


EXPRESSION:  (j_dataload)_[0-9]{8}_[0-9]{14}\.log

SELECTING WHERE TWO COLUMNS ARE IN A SET

 -- works in PostgreSQL, Oracle, MySQL, DB2, HSQLDB 

SELECT whatever 

FROM t                              

WHERE 

(col1, col2) IN ((val1a, val2a), (val1b, val2b), ...) ;

List views in postgresql

SELECT 

table_name 

FROM INFORMATION_SCHEMA.views 

WHERE 

table_schema = ANY (current_schemas(false)) 

ORDER BY 1

Kill job in Linux/Ubuntu

To list the running jobs,

ps -ef|grep java


To kill the job,

ill -9 pid


To kill the application by drag & drop

xkill

Sample shell script - To call the pentaho job

 #!/bin/bash -l

sh /home/abc/data-integration/kitchen.sh -file="/home/abc/etl/j_dataload.kjb" –level:Basic -logfile:/home/abc/etl/Logfiles/j_dataload_$(date +"%Y%m%d_%H%M%s").log

sh /home/abc/data-integration/kitchen.sh -file="/home/abc/etl/j_logfile_archive.kjb" –level:Basic -logfile:/home/abc/etl/Logfiles/j_logfile_archive_$(date +"%Y%m%d_%H%M%s").log


CRON Schedule

 Sample cron schedule,


0 14 * * * /home/abc/etl/ExecutJob.sh

Tuesday, 3 January 2023

Postgresql - Get the last 'X' months

To get the last 'x' months from the current date,

Method 1:

SELECT date_trunc('month', current_date - interval '4' month) as mydate

e.g. Current date is, 2022-01-03 19:30:00

Output is, 2022-09-01 00:00:00

Method 2:

SELECT CURRENT_DATE - INTERVAL '4 months' as mydate

e.g. Current date is, 2022-01-03 19:30:00

Output is, 2022-09-03 00:00:00

Pentaho - Read file and store content in the memory

 Step Name : Load file content in memory


Returns two columns, 

1. File Content

2. File Size

Penatho - Expressions

Compare two columns:

Source:


Comparison:


Pentaho Steps:




Use 'User Defined Java Expression' step in the pentho to achieve this, some sample expressions are below,

Count.equals(Tgt_Count) ? "TRUE" : "FALSE"

Value Type : String


Count<=Tgt_Count ? "OK" : "NOTOK"

Value Type : String

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...