- Define the order in which tasks should run
- Tasks can be upstream (run before) or downstream (run after)
- Declared after creating the tasks
- Recommended
- Alternative
Be Good. Do Good.
KAFKA - EVENT PROCESSING SYSTEM
Topics
- Particular stream of data
- Can be identified by name
e.g. Tables in a database
- Support all type of messages
- The sequence of message is called, data stream
- You cannot query topics, instead use kafka producers to send data and kafka consumers to read the data
- Kafka topics are immutable, Once data is written to a partition, it cannot be changed
- Data is kept for a limited time (default is one week - configurable)
Partitions
- Topics are split into partitions
- Messages within each partitions are ordered
Offset
- Each message within a partition gets an incremental id, called offset
Producers
- Write data to topics
- Producers know to which partition to write
Kafka Connect
-Getting data in and out of kafka
Step-by-Step to Start Kafka
CREATE SHARE my_share;
GRANT USAGE ON DATABASE my_db TO SHARE my_share;
GRANT USAGE ON SCHEMA my_schema.my_db TO SHARE my_share;
GRANT SELECT ON TABLE my_table.myschema.my_db TO SHARE my_share;
ALTER SHARE my_share ADD ACCOUNT a123bc;
CREATE DATABASE my_db FROM SHARE my_share;
To know the usage history,
SELECT * FROM information_schema.materialized_view_refresh_history();
SELECT * FROM information_schema.materialized_view_refresh_history(materialized_view_name => 'mname'));
SELECT * FROM snowflake.account_usage.materialized_view_refresh_history;
Resizing: Warehouses can be resized even when query is running or when suspended.
It impact only future queries, not the running one.
Scale Up vs Scale Out:
Scale Up (Resize) - More complex queries
Scale Out - More User (More queries)
Define the order in which tasks should run Tasks can be upstream (run before) or downstream (run after) Declared after creating the tasks Me...