Tuesday, 19 August 2025

Data Sharing

 

1. Create Share

CREATE SHARE my_share;

2. Grant privileges to share

GRANT USAGE ON DATABASE my_db TO SHARE my_share; GRANT USAGE ON SCHEMA my_schema.my_db TO SHARE my_share; GRANT SELECT ON TABLE my_table.myschema.my_db TO SHARE my_share;

3. Add consumer account(s)

ALTER SHARE my_share ADD ACCOUNT a123bc;

4. Import share

CREATE DATABASE my_db FROM SHARE my_share;

Monday, 18 August 2025

Materialized View & Warehouse

Materialized View

To know the usage history,

    SELECT * FROM information_schema.materialized_view_refresh_history();

    SELECT * FROM information_schema.materialized_view_refresh_history(materialized_view_name             =>  'mname'));

    SELECT * FROM snowflake.account_usage.materialized_view_refresh_history;


Warehouse

Resizing: Warehouses can be resized even when query is running or when suspended.

It impact only future queries, not the running one.


Scale Up vs Scale Out

    Scale Up (Resize) - More complex queries

    Scale Out - More User (More queries)




Micro - Partitions & Clustering

Micro - Partitions
  • Immutable - Can't be changed
  • New data - Added in new partitions
Clustering

Get the clustering key details from the existing tables,

    SELECT * FROM information_schema.tables WHERE clustering_key IS NOT NULL;

To know about the cluster in detail,

    SELECT SYSTEM$CLUSTERING_INFORMATION ('table_name');

To know about the particular column cluster in detail,

    SELECT SYSTEM$CLUSTERING_INFORMATION ('table_name','(column_name)');

To know about the clustering depth,

    SELECT SYSTEM$CLUSTERING_DEPTH ('table_name');

CACHING in Snowflake

Result Cache

  • Stores the results of a query (Cloud Services)
  • Same queries can use that cache in the future
    • Table data has not changed
    • Micro-partitions have not changed
    • Query doesn't include UDFs or external functions
    • Sufficient privileges & results are still available
  • Very fast result (persisted query result)
  • Avoids re-execution
  • Can be disabled by using
    • USE_CACHED_RESULT parameter
  • If query is not re-used purged after 24 hours
  • If query is re-used can be stored up to 31 days

Tip : Result cache is resides in the CLOUD SERVICES layer

Data Cache

  • Local SSD cache
  • Cannot be shared with other warehouses
  • Improve performance of subsequent queries that use the same data
  • Purged if warehouse is suspended or resized
  • Queries with similar data ⇒ same warehouse
  • Size depends on warehouse size
Tip : Data cache is resides in the QUERY PROCESSING layer


Metadata Cache

  • Stores statistics and metadata about objects
  • Properties for query optimization and processing
    • Range of values in micro-partition
  • Count rows, count distinct values, max/min value
  • Without using virtual warehouse
  • DESCRIBE + system-defined functions
  • Called as "Metadata store"
  • Virtual Private Edition: Dedicated metadata store
Tip : Result cache is resides in the CLOUD SERVICES layer

Query History

In 3 ways we will ab able to view the query history,


1. Using SNOWSIGHT( Web UI)

2. Using INFORMATION_SCHEMA

    SELECT * FROM TABLE (information_schema.query_history()) ORDER BY start_time;

3. Using ACCOUNT_USAGE

    SELECT * FROM snowflake.account_usage.query_history;

UNLOADING

Syntax:

COPY INTO @stage_name FROM (SELECT col1, col2, col3 FROM table_name)

FILE_FORMAT = (TYPE = CSV)

HEADER = TRUE


Additional Parameters:

SINGLE

Use the SINGLE parameter to specify whether the file will be split into multiple files. The default is set to FALSE which means the data will be split across multiple files


MAX_FILE_SIZE

To define the file size

COPY - Parameters

CopyOption Description Values
ON_ERROR Specifies the error handling for the load operation CONTINUE | SKIP_FILE | SKIP_FILE_num | 'SKIP_FILE_num%' | ABORT_STATEMENT
SIZE_LIMIT Specifies the maximum size (in bytes) of data to be loaded <num>
PURGE Remove files after successful load TRUE | FALSE
RETURN_FAILED_ONLY Return only files that have failed to load TRUE | FALSE
MATCH_BY_COLUMN_NAME Load semi-structured data into columns in matching the columns names CASE_SENSITIVE | CASE_INSENSITIVE | NONE
ENFORCE_LENGTH Truncate text strings that exceed the target column length TRUE | FALSE
TRUNCATECOLUMNS Truncate text strings that exceed the target column length TRUE | FALSE
FORCE Load files even if loaded before TRUE | FALSE
LOAD_UNCERTAIN_FILES Load files even if load status unknown TRUE | FALSE

Sunday, 17 August 2025

INSERT OVERWRITE

  • Specifies that the target table should be truncated before inserting the values into the table.
  • To use the OVERWRITE option on INSERT, you must use a role that has DELETE privilege on the table because OVERWRITE will delete the existing records in the table.
E.g.

INSERT OVERWRITE INTO table_name
  SELECT * FROM src_table_name
  WHERE city = 'abcd';

[COPY] File Format Parameters

Property Property Type
TYPEString
RECORD_DELIMITERString
FIELD_DELIMITERString
FILE_EXTENSIONString
SKIP_HEADERInteger
DATE_FORMATString
TIME_FORMATString
TIMESTAMP_FORMATString
BINARY_FORMATString
ESCAPEString
ESCAPE_UNENCLOSED_FIELDString
TRIM_SPACEBoolean
FIELD_OPTIONALLY_ENCLOSED_BYString
NULL_IFList
COMPRESSIONString
ERROR_ON_COLUMN_COUNT_MISMATCHBoolean
VALIDATE_UTF8Boolean
SKIP_BLANK_LINESBoolean
REPLACE_INVALID_CHARACTERSBoolean
EMPTY_FIELD_AS_NULLBoolean
SKIP_BYTE_ORDER_MARKBoolean
ENCODINGString

Tuesday, 12 August 2025

Combining Streams & Tasks

CREATE TASK my_task 

WAREHOUSE = my_wh

SCHEDULE = '15 MINUTE'

WHEN SYSTEM$STREAM_HAS_DATA('my_stream_name')

AS 

INSERT INTO my_tgt_table (time_col) VALUES (CURRENT_TIMESTAMP);

Data Sampling Methods

"Data sampling" refers to selecting a subset of data from a larger dataset, typically for testing, analysis, or performance purposes.

  • ROW or BERNOULLI
    • Every ROW is chosen with percentage p
    • More "Randomness"
    • Smaller tables
    • e.g. SELECT * FROM table_name SAMPLE ROW (<p>) SEED(15); 
  • BLOCK or SYSTEM
    • Every BLOCK is chosen with percentage p
    • More "Effectiveness"
    • Larger tables
    • e.g. SELECT * FROM table_name SAMPLE SYSTEM(<p>) SEED(15);
    Here, <p> Returns approximately p% of the table rows randomly.

Snowflake - Important one word questions and answers

Maximum length of a VARIANT data type ::: 16 MB uncompressed
-#-#-#-
What is unstructured data ::: Does not fit into any pre-defined data models,
  • video files
  • audio files
  • documents
-#-#-#-
Snowflake share URL ::: below are the supported,
  • Scoped URL 
    • Temporary URL, expires in 24 hours
    • e.g. SELECT BUILED_SCOPED_FILE_URL(@stage_name,'logo.png');
  • File URL 
    • Permanent one
    • e.g. SELECT BUILD_STAGE_FILE_URL(@stage_name,'logo.png');
  • Pre Signed URL 
    • HTTPS URL used to access file via a web browser
    • e.g. SELECT GET_PRESIGNED_URL(@stage_name,'logo.png',60);
    • here, 60 denotes the seconds to expiry
-#-#-#-
Directory table ::: Stored metadata about staged files
By default it is not enabled, enabling syntax as below

e.g. CREATE OR REPLACE STAGE my_internal_stage
  FILE_FORMAT = my_json_format
  DIRECTORY = ( ENABLE = TRUE );

e.g. To query the directory table
    SELECT * FROM DIRECTORY(@my_internal_stage);

Note: At first the data will not be visible, you need to manually refresh and see the data,

ALTER STAGE my_internal_stage REFRESH;

-#-#-#-
Streams ::: Record (DML) changes made to a table

3 columns will be newly added,
  1. metadata$action
  2. metadata$update
  3. metadata$row_id
STALE ::: 

Stream becomes stale(no longer available) when offset is outside the data retention period of the source table.

The column STALE_AFTER indicating when the stream is predicted to become stale.

TIME TRAVEL ::: Undrop fails if an object with the same name already exists.

FAIL SAFE :::
  • Protection of historical data in case of disaster
  • No user interaction & recoverable only by snowflake
  • Non configurable 7 day period
  • Period starts immediately after Time Travel period ends
  • Contributes to storage cost
TIME TRAVEL & FAIL SAFE - STORAGE COST

Use below queries to get the information,

SELECT * FROM snowflake.account_usage.storage_usage;

SELECT * FROM snowflake.account_usage.table_storage_metrics;

Monday, 11 August 2025

Stage with JSON load

CREATE FILE FORMAT:

  CREATE OR REPLACE FILE FORMAT my_json_format

  TYPE = 'JSON'

  COMPRESSION = 'AUTO'

  ENABLE_OCTAL = FALSE

  ALLOW_DUPLICATE = FALSE

  STRIP_OUTER_ARRAY = TRUE

  IGNORE_UTF8_ERRORS = FALSE;

CREATE INTERNAL NAMED STAGE:

  CREATE OR REPLACE STAGE my_internal_stage

  FILE_FORMAT = my_json_format;

CREATE TABLE:

CREATE OR REPLACE TABLE json_data_raw (

  id NUMBER AUTOINCREMENT,

  raw_json VARIANT,

  load_time TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP

);

LIST THE FILES IN THE STAGE:

LIST @my_internal_stage; 

REMOVE THE FILES FROM STAGE:

REMOVE @my_internal_stage; 

COPY COMMAND TO LOAD THE DATA (INTO A PARTICULAR COLUMN IN THE TABLE)

COPY INTO json_data_raw(raw_json)

FROM @my_internal_stage/samplejson.json

FILE_FORMAT = (FORMAT_NAME = 'my_json_format');

OUTPUT OF THE LOAD:


 QUERY THE TABLE:
SELECT
RAW_JSON:name::string AS emp_name,
RAW_JSON:email::string AS emp_email,
RAW_JSON:isActive::boolean AS emp_status
FROM json_data_raw;

 

 

 

Stored Procedures

         

            Typically performs database operations, usually administrative operations like INSERT, UPDATE or DELETE.

Doesn't need to return value.

Supported languages are,

  • Snowflake scripting (Snowflake SQL + procedural logic)
  • Javascript
  • Snowpark API (Python, Scala & Java)

General Syntax:

CREATE PROCEDURE proc_name (table_name varchar ,n1 int)
EXECUTE as caller 
RETURN int
LANGUAGE sql
AS
BEGIN
UPDATE IDENTIFIER(:table_name) SET col_name = :n1;
END;

NOTE: 

  • If argument is used in sql statement to refer to as an object then use IDENTIFIER(:argument)
  • If argument is used in sql statement then use :argument

PRIVILEGES:

  • Runs either with caller's or owner's rights.
  • By default, will run with owner's privileges


UDF - User Defined Functions

Typically calculate and return a value.

Supported  Languages:

  • SQL
  • Python
  • Java
  • Javascript


Scalar functions: Returns one output row per input row

Tabular functions: Returns a tabular value for each row

        

Function is a securable-schema level object 


 

General syntax for function:

CREATE FUNCTION (n int)

returns int

AS

$$

n+2;

$$;

Approximation of the most frequent values

 Syntax: 

APPROX_TOP_K( expr, k, counters )

Arg1: the column name for which you want to find the most common values.

Arg2: if you want to see the top 10 most common values, then set k to 10

Arg3: Max no.of distinct values that can be tracked

 

Normal method to get most frequent value from the table,

    SELECT customer_id, COUNT(customer_id) FROM table_name GROUP BY 1;

Using estimation function,

    SELECT APPROX_TOP_K(customer_id,5,20) FROM table_name;

Benchmark: For 150,000,000 rows

  •     Normal method took, 12s
  •     Estimation function took, 5.6s

Also check for:

APPROX_TOP_K_ACCUMULATE , 
APPROX_TOP_K_COMBINE, 
APPROX_TOP_K_ESTIMATE

Estimating the Number of Distinct Values

Normal method to get the distinct values,

    SELECT COUNT(DISTINCT(customer_name)) FROM table_name;

Using estimation function,

    SELECT HLL(customer_name) FROM table_name;


Benchmark: For 150,000,000 rows

  •     Normal method took, 12s
  •     Estimation function took, 5.6s

Thursday, 7 August 2025

Stage and Copy - Useful Commands

To show the list of stages: 

  • SHOW STAGES;
  • DESC STAGE name_of_the_stage;

To show the list of file formats : 

  • SHOW FILE FORMATS;
  • DESC FILE FORMAT name_of_the_file_format;

Sample file format creation :

CREATE OR REPLACE FILE FORMAT emp_csv_format

  TYPE = 'CSV'

  FIELD_DELIMITER = ','

  SKIP_HEADER = 1

  FIELD_OPTIONALLY_ENCLOSED_BY = '"'

  NULL_IF = ('NULL', 'null');

Sample file format alter :

ALTER FILE FORMAT emp_csv_format

SET FIELD_DELIMITER = '|';

LIST @emp_named_stage;


Sample stage creation :

CREATE OR REPLACE STAGE emp_named_stage

  FILE_FORMAT = emp_csv_format;

Read data from the stage :

SELECT

        METADATA$FILENAME,

        METADATA$FILE_ROW_NUMBER,

        $1 AS column1_name, -- Referencing the first column

        $2 AS column2_name,  -- Referencing the second column

        $3 AS Age,

        $4 AS City,

        $5 AS unknowncolumn

    FROM @emp_named_stage (file_format => emp_csv_format) AS t;

Copy the data from stage to table :

COPY INTO emp_data FROM @emp_named_stage/samplecsv_error.csv ON_ERROR='continue';

Remove the file from stage :

REMOVE @emp_named_stage/samplecsv_error1.csv;

Validation mode :

COPY INTO emp_data

FROM @emp_named_stage/samplecsv_error1.csv

FILE_FORMAT = emp_csv_format

VALIDATION_MODE = 'RETURN_ERRORS';

To check the history of file load :

SELECT * FROM DIRECTORY(@name_of_the_stage);



Tuesday, 5 August 2025

Stored Procedure - Language Selection SQL vs Javascript

 Choosing between LANGUAGE JAVASCRIPT and LANGUAGE SQL (Snowflake Scripting) for stored procedures depends on what you’re trying to accomplish.


Use SQL scripting when:

  • You're mostly executing SQL statements.

  • You want clean, readable procedures.

  • You're doing DML (INSERT, UPDATE, DELETE).


Use JavaScript when:

  • You need complex procedural logic not easily expressed in SQL.

  • You want to build dynamic SQL more flexibly.

  • You want to loop through result sets or arrays.


Snowflake - Stored Procedures [ Error free script - ready to use]

 1. Basic Stored Procedure


create or replace procedure proc_return_str()

  RETURNS VARCHAR

  LANGUAGE javascript

  EXECUTE AS CALLER

  AS '

  return "Hello from the snowflake stored procedure!";

  ';


call proc_return_str();


2. Stored Procedure with Parameters 


CREATE OR REPLACE PROCEDURE proc_withparampass(message VARCHAR)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
  RETURN 'Hello from Snowflake! You passed: ' || message;
END;
$$;

CALL proc_withparampass('Snowflake Stored Procedures');

3. Stored Procedures with Update SQL statement


CREATE OR REPLACE PROCEDURE apply_bonus(bonus_percentage DECIMAL(5, 2), min_rating INT)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
  rows_updated INT;
BEGIN
  UPDATE employees
  SET tot_salary = salary * (1 + :bonus_percentage / 100)
  WHERE performance_rating >= :min_rating;
  rows_updated := SQLROWCOUNT;
  RETURN 'Bonus applied to ' || rows_updated || ' employees.';
END;
$$;

CALL apply_bonus(5, 5);

4. Stored Procedures with Return Output as Table


CREATE OR REPLACE PROCEDURE proc_return_table(min_salary DECIMAL(10, 2))
RETURNS TABLE(emp_id INT, salary DECIMAL(10, 2))
LANGUAGE SQL
AS
$$
DECLARE
  res RESULTSET;
BEGIN
  res := (SELECT emp_id, salary FROM employees WHERE salary >= :min_salary ORDER BY salary DESC);
  RETURN TABLE(res);
END;
$$;

CALL proc_return_table(70000);


5. Stored Procedures with Insert SQL statement


CREATE OR REPLACE PROCEDURE proc_calc_writein_insertinothertable(
    start_date DATE,
    end_date DATE,
    in_region STRING
)
RETURNS STRING
LANGUAGE SQL
AS
$$
DECLARE
    v_total_sales NUMBER;
    v_total_orders NUMBER;
    v_avg_order_value NUMBER;
BEGIN

    INSERT INTO SALES_SUMMARY_LOG (
        RUN_TIMESTAMP, SALES_REGION, START_DATE, END_DATE,
        TOTAL_SALES, TOTAL_ORDERS, AVERAGE_ORDER_VALUE, STATUS
    )
    SELECT CURRENT_TIMESTAMP,:in_region, :start_date, :end_date,
        SUM(AMOUNT) AS v_total_sales, COUNT(*) AS v_total_orders, AVG(AMOUNT) AS v_avg_order_value
        ,'SUCCESS'
    FROM SALES
    WHERE ORDER_DATE BETWEEN :start_date AND :end_date
      AND SALES_REGION = :in_region;
      

    RETURN 'Summary calculation completed successfully for region: ' || :in_region ;

EXCEPTION
    WHEN OTHER THEN
        INSERT INTO SALES_SUMMARY_LOG (
            RUN_TIMESTAMP, SALES_REGION, START_DATE, END_DATE,
            TOTAL_SALES, TOTAL_ORDERS, AVERAGE_ORDER_VALUE, STATUS
        )
        VALUES (
            CURRENT_TIMESTAMP, :in_region, :start_date, :end_date,
            NULL, NULL, NULL, 'FAILED'
        );
        RETURN 'An error occurred during summary calculation.';
END;
$$;

CALL proc_calc_writein_insertinothertable('2025-01-02', '2025-01-02', 'West');

6. Stored Procedures with Calculation


Eg: 1


CREATE OR REPLACE PROCEDURE proc_simple_num_calc()
RETURNS NUMBER
LANGUAGE SQL
AS
$$
DECLARE
    profit NUMBER(38, 2) DEFAULT 0.0; -- Variable declared with a default value
    cost NUMBER(38, 2);              -- Variable declared without a default value (will be NULL initially)
BEGIN
    cost := 100.0;
    LET revenue NUMBER(38, 2) DEFAULT 110.0; -- LET can also be used inside BEGIN...END
    profit := revenue - cost;
    RETURN profit;
END;
$$;

CALL proc_simple_num_calc();


Eg: 2


CREATE OR REPLACE PROCEDURE calculate_revenue()
RETURNS NUMBER
LANGUAGE SQL
AS
$$
BEGIN
    LET sales_amount NUMBER(38, 2) := 500.0; -- Declared and initialized using :=
    LET discount_rate NUMBER(38, 2) DEFAULT 0.10; -- Declared and initialized using DEFAULT
    LET net_revenue NUMBER(38, 2);

    net_revenue := sales_amount * (1 - discount_rate);
    RETURN net_revenue;
END;
$$;

CALL calculate_revenue();


7. Stored Procedure with Cursor & Get count of a table:

CREATE OR REPLACE PROCEDURE proc_cursor_getcountofthetable(table_name STRING)

RETURNS INT

LANGUAGE SQL

AS

$$

DECLARE

  sql_text STRING;

  rs RESULTSET;

  row_count INT DEFAULT 0;

  C1 CURSOR FOR SELECT COUNT(*) FROM customer_src;

BEGIN

  OPEN C1;

  FETCH C1 INTO row_count;

  CLOSE C1;

  IF (row_count > 4) THEN

    RETURN 'Success';

  ELSE

    RETURN 'Failed';

  END IF;

END;

$$;


CALL proc_cursor_getcountofthetable('customer_src');


8. Stored Procedure with Execute Immediate & Dynamic SQL:


CREATE OR REPLACE PROCEDURE proc_with_ex_immediate_dynamic_sql(

    minimum_price NUMBER(12,2),

    maximum_price NUMBER(12,2))

  RETURNS TABLE (order_id NUMBER(38,0), SALES_REGION VARCHAR, ORDER_DATE DATE, amount NUMBER(38, 0))

--Give all the columns from the table, with exact data type and sequence  

  LANGUAGE SQL

AS

$$

DECLARE

  rs RESULTSET;

  query VARCHAR DEFAULT 'SELECT * FROM SALES WHERE amount > ? AND amount < ?';

BEGIN

  rs := (EXECUTE IMMEDIATE :query USING (minimum_price, maximum_price));

  RETURN TABLE(rs);

END;

$$

;


CALL proc_with_ex_immediate_dynamic_sql(1000,1500);


Data Sharing

  1. Create Share CREATE SHARE my_share; 2. Grant privileges to share GRANT USAGE ON DATABASE my_db TO SHARE my_share; GRANT USAGE ...