Last active
December 31, 2020 06:04
-
-
Save nelsonsequiera/d92d2cd1f1a6611fb5850916ca97d26f to your computer and use it in GitHub Desktop.
Redshift best distribution key test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* get current sort and dist keys | |
* Materialize a single column to check distribution | |
* Identify the table OID | |
* get skewed data stats |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT "column", | |
TYPE, | |
distkey, | |
sortkey | |
FROM pg_table_def | |
WHERE schemaname = 'public' | |
AND tablename = '<tablename>' | |
AND ( | |
distkey = TRUE | |
OR sortkey <> 0 | |
); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CREATE TEMP TABLE t1 DISTKEY (<column>) AS | |
SELECT <column> FROM <tablename>; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT 't1'::regclass::oid; | |
-- gives an oid back |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT "table" tablename, skew_rows, | |
ROUND(CAST(max_blocks_per_slice AS FLOAT) / | |
GREATEST(NVL(min_blocks_per_slice,0)::int,1)::FLOAT,5) storage_skew, | |
ROUND(CAST(100*dist_slice AS FLOAT) / | |
(SELECT COUNT(DISTINCT slice) FROM stv_slices),2) pct_populated | |
FROM svv_table_info ti | |
JOIN (SELECT tbl, MIN(c) min_blocks_per_slice, | |
MAX(c) max_blocks_per_slice, | |
COUNT(DISTINCT slice) dist_slice | |
FROM (SELECT b.tbl, b.slice, COUNT(*) AS c | |
FROM STV_BLOCKLIST b | |
GROUP BY b.tbl, b.slice) | |
WHERE tbl = <oid> GROUP BY tbl) iq ON iq.tbl = ti.table_id; | |
/* | |
sample output of bad dist key: | |
tablename | skew_rows | storage_skew | pct_populated | |
-----------+-----------+--------------+--------------- | |
t1 | 42.81 | 21.97688 | 66.67 | |
sample output of good dist key: | |
tablename | skew_rows | storage_skew | pct_populated | |
-----------+-----------+--------------+--------------- | |
t2 | 1.00 | 1.00063 | 66.67 | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment