- Don’t SELECT *, Specify explicit column names (columnar store)
- Avoid large JOINs (filter each table first)
- In PRESTO tables are joined in the order they are listed!!
- Join small tables earlier in the plan and leave larger fact tables to the end
- Avoid cross joins or 1 to many joins as these can degrade performance
 
- Order by and group by take time
- only use order by in subqueries if it is really necessary
 
- When using GROUP BY, order the columns by the highest cardinality (that is, most number of unique values) to the lowest.
- Use approx_distinct() instead of count(distinct) for very large datasets
- Use approx_percentile(metric, 0.5) for median
- Avoid UNIONs where possible
- Use WITH statements vs. nested subqueries
- Presto broadcasts the right side table in joins, declare larger tables first and filter right side tables to as small as possible
- LIKE takes time, in particular when you add %s on both sides
- Use REGEXP_LIKE() if multiple like statements
- l_comment LIKE '%wake%' OR l_comment LIKE '%regular%' OR l_comment LIKE '%express%'-->- regexp_like(l_comment, 'wake|regular|express')
 
- SHOW COLUMNS FROM table_name;
- SELECT * SYSTEM.JDBC.TABLES- Can also use COLUMNS, CATALOGS, SCHEMAS
- Use WHERE COLUMN_NAME/TABLE_NAME Like ‘%client%’
 
- Can also use 
- SHOW TABLES FROM SYSTEM.JDBC
References
- start analysis on a small dataset
- Use the query plan (EXPLAIN)
- ALIAS tables in the case of multiple joins