Presto Specific

Don’t SELECT *, Specify explicit column names (columnar store)
Avoid large JOINs (filter each table first)
- In PRESTO tables are joined in the order they are listed!!
- Join small tables earlier in the plan and leave larger fact tables to the end
- Avoid cross joins or 1 to many joins as these can degrade performance
Order by and group by take time
- only use order by in subqueries if it is really necessary
When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
Use approx_distinct() instead of count(distinct) for very large datasets
Use approx_percentile(metric, 0.5) for median
Avoid UNIONs where possible
Use WITH statements vs. nested subqueries
Presto broadcasts the right side table in joins, declare larger tables first and filter right side tables to as small as possible
LIKE takes time, in particular when you add %s on both sides
- Use REGEXP_LIKE() if multiple like statements
- l_comment LIKE '%wake%' OR l_comment LIKE '%regular%' OR l_comment LIKE '%express%' --> regexp_like(l_comment, 'wake|regular|express')
SHOW COLUMNS FROM table_name;
SELECT * SYSTEM.JDBC.TABLES
- Can also use COLUMNS, CATALOGS, SCHEMAS
- Use WHERE COLUMN_NAME/TABLE_NAME Like ‘%client%’
SHOW TABLES FROM SYSTEM.JDBC

References

General SQL tips