Thursday, 22 December 2022

DB - Interview Questions

DELETE DUPLICATE RECORD:

DELETE e1 FROM EMPLOYEE e1, EMPLOYEE e2

WHERE 

e1.NAME = e2.NAME

AND

e1.ID > e2.ID;


FIND THE 3 HIGHEST VALUE:

WITH CTE AS

(

SELECT 

NAME,SALARY,RN=ROW_NUMBER() OVER (ORDER BY SALARY DESC) 

FROM EMPLOYEE

)

SELECT NAME,SALARY FROM CTE WHERE RN = 3; 

FIND THE 2 HIGHEST VALUE:

SELECT 

MAX(SALARY) 

FROM EMPLOYEE

WHERE

SALARY NOT IN (SELECT MAX(SALARY) FROM EMPLOYEE)

GET EVEN/ODD ROWS

SELECT * FROM 

(SELECT *,ROW_NUMBER() OVER (ORDER BY id) AS ROW_ID FROM STUDENT)

WHERE ROW_ID %2 = 0

Wednesday, 21 December 2022

Postgresql - Extract YEAR/ MONTH from TIME STAMP

 SELECT 

EXTRACT(YEAR FROM created),

EXTRACT(MONTH FROM created),

COUNT(*) 

FROM 

schemaname.tablename

GROUP BY 

EXTRACT(YEAR FROM created),

EXTRACT(MONTH FROM created) 

ORDER BY 1,2

Excel - Subtract two time stamp columns/rows

Formula

=TEXT(A70-A69, "h:mm:ss")

Other samples,

Calculate hours between two times: 

=TEXT(B2-A2, "h")

Return hours and minutes between 2 times: 

=TEXT(B2-A2, "h:mm")

Return hours, minutes and seconds between 2 times: 

=TEXT(B2-A2, "h:mm:ss")

Pentaho - Regex to find a word in a line and replace the full line

 ^.*welcome.*$

Pentaho - Java Script example for if..else

 var out_record_count;

 if (start_pos == -1)

{

out_record_count=0

}

else if (end_pos == -1 )

{

out_record_count=0

}

else 

{

out_record_count=text.substring(start_pos+3, end_pos)

}

Pentaho Spoon - Unable to create the database cache

Couldn't read the database cache

org.pentaho.di.core.exception.KettleFileException 


The solution was to delete the db.cache* File in the user home dir


Linux: 

/home/abc/.kettle

db.cache-8.0.1-stable


Windows:

C:\Users\abc\.kettle

db.cache-8.3.0.0-371

Postgresql - Between two timestamp/date

SELECT * FROM schemaname.tablename

WHERE columnname between '2022-12-14 00:00:00' AND '2022-12-19 14:00:03' ORDER BY 1 DESC


NOTE: Includes both the dates 2022-12-14 & 2022-12-19

Tuesday, 20 December 2022

Pentaho - JavaScript to find string position & find substring

var text = Field_000;

var start_pos = text.indexOf(" W=");

var end_pos = text.indexOf(", U=");

var out_record_count = text.substring(start_pos+3, end_pos);

PENATHO - JOB EXECUTION SHELL COMMAND & LOG FILE WRITE ENABLE, LOCATION

 /home/abc/data-integration_8_1/kitchen.sh -file="/home/abc/myfolder/j_sample.kjb" –level:Basic -logfile:/home/abc/myfolder/Logfiles/j_sample_$(date +"%Y%m%d_%H%M%s").log

Postgresql - Add/ Subtract Hours/Minutes from Date/ Timestamp

SELECT now() - interval '5 minutes';


SELECT now() - interval '36 hours';

Sunday, 11 December 2022

Pentaho job execution shell command & with log file enable option

/home/dev/data-integration_8_1/kitchen.sh -file="/home/dev/abc/j_dataload.kjb" –level:Basic -logfile:/home/dev/abc/Logfiles/j_dataload_$(date +"%Y%m%d_%H%M%s").log

MySQL find duplicate records

#1

SELECT offset, COUNT(offset)  

FROM log 

GROUP BY offset  

HAVING COUNT(offset) > 1;  


#2

SELECT A.id, A.ts, CONVERT(TIMEDIFF (B.ts,A.ts),CHAR) AS timedifference,a.offset,a.input_offset

FROM log A INNER JOIN log B ON B.id = (A.id + 1)

ORDER BY TIMEDIFF (B.ts,A.ts) DESC 

DISCARD MILLISECOND PART FROM TIMESTAMP

 SELECT now()::timestamp(0);

Create Index

 CREATE INDEX sync_idx_abc ON schemaname.tablename(columnname);

Drop Not Null Constraint

 ALTER TABLE schemaname.tablename ALTER COLUMN abc DROP NOT NULL;

Text To Timestamp Conversion

 TO_TIMESTAMP('1999-01-01 00:00:00','YYYY-MM-DD HH24:MI:SS')

Get the record count from all tables in a schema

select table_schema, 

       table_name, 

       (xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count

from (

  select table_name, table_schema, 

         query_to_xml(format('select count(*) as cnt from %I.%I', table_schema, table_name), false, true, '') as xml_count

  from information_schema.tables

  where table_schema = 'test'

AND table_name IN ('abc') --<< change here for the schema you want

) t order by 3 DESC


---------------------------------------------

APPROACH 2 with WHERE CONDITION

---------------------------------------------


select table_schema, 

       table_name, 

       (xpath('/row/cnt/text()', xml_count))[1]::text::int as row_count

from (

  select table_name, table_schema, 

         query_to_xml(format('select count(*) as cnt from %I.%I  WHERE updated > ''2022-10-01 00:00:00''', table_schema, table_name), false, true, '') as xml_count

  from information_schema.tables

  where table_schema = 'test'

AND table_name NOT IN ('abc')

AND table_name NOT LIKE ('ad_%') --<< change here for the schema you want

) t order by 3 DESC

Tuple

 

<class 'tuple'> <class 'str'>
('apple', 'kiwi', 'cherry')
('apple', 'banana', 'cherry', 'orange')
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In [2], line 32
     30 thistuple = ("apple", "banana", "cherry")
     31 del thistuple
---> 32 print(thistuple)

NameError: name 'thistuple' is not defined

green = apple yellow = banana red = cherry Approach 1 : apple Approach 1 : banana Approach 1 : ['cherry', 'strawberry', 'raspberry'] Approach 2 : apple Approach 2 : ['mango', 'papaya', 'pineapple'] Approach 2 : cherry
2

List

MyList : ['apple', 'banana', 'cherry'] ['apple', 'banana', 'cherry', 'apple', 'cherry']
3
banana ['cherry', 'orange', 'kiwi'] ['apple', 'banana', 'cherry', 'orange'] ['cherry', 'orange', 'kiwi', 'melon', 'mango']
Yes, 'apple' is in the fruits list
['apple', 'blackcurrant', 'cherry']
['apple', 'banana', 'cherry', 'orange']
['apple', 'orange', 'banana', 'cherry']
['apple', 'banana', 'cherry', 'mango', 'pineapple', 'papaya']
['apple', 'cherry']
['apple', 'cherry']
['apple', 'banana']
[]
apple banana cherry
apple banana cherry
apple banana cherry
['apple', 'banana', 'mango']
['banana', 'kiwi', 'mango', 'orange', 'pineapple'] [23, 50, 65, 82, 100] ['pineapple', 'orange', 'mango', 'kiwi', 'banana'] [100, 82, 65, 50, 23]
[50, 65, 23, 82, 100]
['apple', 'banana', 'cherry'] ['apple', 'banana', 'cherry']
1 2

DBT - Models

Models are where your developers spend most of their time within a dbt environment. Models are primarily written as a select statement and ...