"Data sampling" refers to selecting a subset of data from a larger dataset, typically for testing, analysis, or performance purposes.
- ROW or BERNOULLI
- Every ROW is chosen with percentage p
- More "Randomness"
- Smaller tables
- e.g. SELECT * FROM table_name SAMPLE ROW (<p>) SEED(15);
- BLOCK or SYSTEM
- Every BLOCK is chosen with percentage p
- More "Effectiveness"
- Larger tables
- e.g. SELECT * FROM table_name SAMPLE SYSTEM(<p>) SEED(15);
Here, <p> Returns approximately p% of the table rows randomly.
No comments:
Post a Comment