- Don’t
SELECT *
, Specify explicit column names (columnar store) - Avoid large JOINs (filter each table first)
- In PRESTO tables are joined in the order they are listed!!
- Join small tables earlier in the plan and leave larger fact tables to the end
- Avoid cross joins or 1 to many joins as these can degrade performance
- Order by and group by take time
- only use order by in subqueries if it is really necessary
- When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
- Use approx_distinct() instead of count(distinct) for very large datasets
- Use approx_percentile(metric, 0.5) for median
- Avoid UNIONs where possible
- Use WITH statements vs. nested subqueries
- Presto broadcasts the right side table in joins, declare larger tables first and filter right side tables to as small as possible
- LIKE takes time, in particular when you add %s on both sides
- Use REGEXP_LIKE() if multiple like statements
l_comment LIKE '%wake%' OR l_comment LIKE '%regular%' OR l_comment LIKE '%express%'
-->regexp_like(l_comment, 'wake|regular|express')
SHOW COLUMNS FROM table_name;
SELECT * SYSTEM.JDBC.TABLES
- Can also use
COLUMNS, CATALOGS, SCHEMAS
- Use
WHERE COLUMN_NAME/TABLE_NAME Like ‘%client%’
- Can also use
SHOW TABLES FROM SYSTEM.JDBC
References
- start analysis on a small dataset
- Use the query plan (EXPLAIN)
- ALIAS tables in the case of multiple joins