Skip to content

Instantly share code, notes, and snippets.

View nelsonsequiera's full-sized avatar
🤷‍♂️
Experiment - Fail - Learn - Repeat

Nelson Sequiera nelsonsequiera

🤷‍♂️
Experiment - Fail - Learn - Repeat
View GitHub Profile
@nelsonsequiera
nelsonsequiera / Redshift usefull queries.txt
Last active December 31, 2020 06:04
Redshift usefull Queries
Redshift usefull queries
@nelsonsequiera
nelsonsequiera / distribution style.md
Last active January 22, 2025 01:41
redshift best suggested recommended column encoding by data types and by encoding types. column encodings: raw AZ64 Byte-dictionary Delta LZO Mostly Runlength Text255 ZSTD Data types:BOOLEAN DOUBLE SMALLINT INTEGER BIGINT DECIMAL REAL DOUBLE CHAR VARCHAR DATE TIMESTAMP TIMESTAMPTZ TIME
style use case
key same value same slice, for joins
all dimension small table for frequent joins, data is copied to all nodes first slice
even unknown cases
  • Set DISTKEY to the column most used in a JOIN
  • Set SORTKEY to the column(s) most used in a WHERE
@nelsonsequiera
nelsonsequiera / _0 redshift column encoding test.md
Created December 22, 2020 17:21
redshift testing column encoding selecting best column encoding
  • get the column count
  • create a new table with encodings you want to test
  • insert the same column to all the column in new table
  • Query the STV_BLOCKLIST system table
  • encoding with lowest storage block count is the best encoding.
@nelsonsequiera
nelsonsequiera / 00_Redshift best distribution key test.txt
Last active December 31, 2020 06:04
Redshift best distribution key test
* get current sort and dist keys
* Materialize a single column to check distribution
* Identify the table OID
* get skewed data stats