Skip to content

Instantly share code, notes, and snippets.

@just3ws
Created April 19, 2017 14:59
Show Gist options
  • Save just3ws/b1a2e767f5b8abd7ad9a7ead06ae9e30 to your computer and use it in GitHub Desktop.
Save just3ws/b1a2e767f5b8abd7ad9a7ead06ae9e30 to your computer and use it in GitHub Desktop.
Postgres EXPLAIN Lunch & Learn @ BenchPrep

Postgres EXPLAIN Lunch & Learn @ BenchPrep

EXPLAIN Explained video on YouTube

What EXPLAIN doesn't do

  • Tell you why a particular index isn't used
  • Explain how to rewrite your queries
  • Show what other factors make the DB slow
  • Tell you how much time the request took outside the DB

The query planner

  1. Break the query down into atomic "nodes"
  2. Figure out every possible way to execute each "node" and it's cost
  3. Chain the combinations together into plans
  4. Calculate the total "cost" of each plan
  5. Pick the plan with the lowest "cost"

Nodes

(Rowset + Parameters) =(Node Operation)=> Rowset

Sort node

(Scan rows + Sort columns) =(Sort)=> Sorted rows

$cost$

A figure in completely arbitrary cost units (float) which is intended to represent estimated time and system resources required to execute the query based on dubious theor and proven practice.

(some people adjust seq_page_cost)

EXPLAIN vs EXPLAIN ANALYZE

  • EXPLAIN shows what the planner planned to do.
  • EXPLAIN ANALYZE shows the plan, executes the query, then shows how it did.
  • Most of the time you want to run EXPLAIN ANALYZE unless:
    • The query never finishes.
    • You're just checking index usage.
    • Your just checking for partition scans.

Other EXPLAIN options

  • buffers
  • costs & timing
  • format (yaml, xml, json)
  • verbose
explain (analyze on, verbose on, format yaml)

NOTE

The explain output is an inverted tree. Read from inside to outside.

Node Types

Scans

Most will contain a filter condition

Seq Scan

Full-table scan

Index Scan

  • Scan index
  • Look up tuples in table

Index-Only Scan

  • Scan index
  • Ignore Heap

Bitmap Scan

Bitmap Heap Scan, Bitmap Index Scan

  • Scan index
  • Build bitmap of compliant rows

Joins

NestLoop

  • For each row in Set A:
    • Look up rows in Set B
# Essentially...
for each row in input_a
  for each row in input_b
    do something
  next
next
# Ref: https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT

Hash Join

  • Make a Hash Table of Set A:
    • Match groups of rows in Set B

Merge Join

  • Sort both Set A and Set B
    • The "zip" them together

Lateral Join

  • Join against table expressions

Semi-Join

  • Partial join, for outer join

Anti-Join

  • Exclusion, such as for NOT IN()

Aggregates

GroupAggregate

  • Regular aggregation

HashAggregate

  • In-memory aggregate using hash tables

WindowAgg

  • Used for windowing clause expressions

Other operations

Unique

  • De-deduplicate rows, like for DISTINCT or UNION

Sort

  • Sort

Limit

  • Supports LIMIT/OFFSET

CTEScan

  • Joins a CTE (WITH clause) to the main query

SubqueryScan

  • Same as CTEScan but for subqueries

Materialize

  • Create a recordset in-memory from a query fragment

Append

  • Merge rowsets for UNION ALL and partitions

Hints when reading EXPLAIN

  • Seq Scan usually can mean an index is needed on the scanned columns
  • Index Scan usually can mean there was a bad row estimate (?)
  • Nested Loops look at time consumption, rows, loop count
  • Index Scan look at time consumption, rows, loop count

Tweaking

  1. Run EXPLAIN
  2. Toggle options based on initial output
  3. GOTO 1

Automatically check

F.3. auto_explain

The auto_explain module provides a means for logging execution plans of slow statements automatically, without having to run EXPLAIN by hand. This is especially helpful for tracking down un-optimized queries in large applications.

The module provides no SQL-accessible functions. To use it, simply load it into the server. You can load it into an individual session:

LOAD 'auto_explain'; (You must be superuser to do that.) More typical usage is to preload it into some or all sessions by including auto_explain in session_preload_libraries or shared_preload_libraries in postgresql.conf. Then you can track unexpectedly slow queries no matter when they happen. Of course there is a price in overhead for that.

Reading

Other tools

User.where(:id => 1).joins(:posts).explain
config.active_record.auto_explain_threshold_in_seconds = 0.5

Help to kill N+1 queries and unused eager loading.

A performance dashboard for Postgres

Parses SQL queries using a copy of the PostgreSQL server query parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment