->
Marks the start of info on a "plan node"- Work from the leaves to the root to understand what happened first
(cost=0.00..5.04 rows=101 width=0)
(cost=
- The cost estimations are in units of arbitrary comptutation; by default the cost of a sequential page fetch. See these variables for how this can be controlled.
0.00..
- estimated start-up cost before output can begin. E.g. the sorting in a Sort node...5.04
- estimated total cost assuming the node runs to completion. Includes the start-up cost (as shown by this).
rows=101
is the estimated number of rows that will be emitted by this node, assuming the node runs to completion. Maybe reduced if aFilter
will be executed.width=0)
is the estimated average size in bytes of a row returned by this plan node.
(actual time=0.049..0.049 rows=100 loops=1)
When using EXPLAIN ANALYZE
because the query is actually executed.
actual time=
- is an average of all the loops, multiply by loops to find the total time spent.nnn...
- start-up time in ms of real time....nnn
- total time in ms of real time.
rows
- how many rows returned, is an average per-execution if there are loops.loops
- how many times the operation was executed
Filter: unique1 > 100
this means a plan node checks a condition for each row it scans. Will cause a reduction in estimatedrows
returned, but will increase thecost
because it will add (rows * cpu_operator_cost
).Join Filter
- says this if we're using some kind ofOUTER JOIN
When using EXPLAIN (ANALYZE, BUFFERS)
(see EXPLAIN Parameters):
Helps to identify which parts of the query are the most I/O-intensive.
The unit is blocks
, the stats are hit
, read
, dirtied
and written
for:
shared
blocks contain data from regular tables and indexeslocal
blocks contain data from temporary tables and indexestemp
blocks contain short-term data for sorts, hashes, Materialize etc.
Meaning:
hit
- a read was avoided because a block was already found in cachedirtied
- the number of previously unmodified blocks changed by this querywritten
- previously dirtied blocks evicted from cache during this query
Parent nodes include the sum of the values for their children.
In text mode assume 0 values if not shown.
The following variables are used to estimates query costs. See their defintions on Planner Cost Constants.
Here are the defaults:
var | default | desc |
---|---|---|
seq_page_cost | 1.0 | cost of a disk page fetch |
random_page_cost | 4.0 | cost of a non-sequentially-fetched disk page |
cpu_tuple_cost | 0.01 | cost of processing each row during a query |
cpu_index_tuple_cost | 0.005 | cost of processing each index entry during an index scan |
cpu_operator_cost | 0.0025 | cost of processing each operator or function executed during a query |
E.g. find number of pages and rows:
SELECT relpages, reltuples FROM pg_class WHERE relname = 'my_table';
relpages | reltuples
----------+-----------
3920 | 48608
(1 row)
Then plug those into the following for a SELECT * FROM ...
:
total_cost = ("disk pages read" * seq_page_cost) + (rows scanned * cpu_tuple_cost)
See full example on Using EXPLAIN.
Scan nodes return raw rows from a table:
- Seq Scan - iterate rows of the table filtering
- Index Scan - search the index, fetches rows from the table in index order, so may cause random I/O to load pages.
- Index Only Scan - when all the needed cols are in the index
- Bitmap Scans - Used to avoid random I/O caused by fetching rows from an index. Table rows are
fetched in physical order.
Is always structured as a Bitmap Heap Scan with one or more nested Bitmap Index Scans. Multiple
Bitmap Index Scans will be combined using BitmapOr or BitmapAnd.
- Bitmap Index Scan - (inner) using an index, produces a truth table of disk pages that might contain the rows we need.
- Bitmap Heap Scan - (outer) sorts the row locations into physical order before reading the pages indicated by the truth table into memory. The "bitmap" is what does the sorting.
- Function Scan - a scan over tuples generated by a function
- Hash Join / Hash - A join implemented using a Hash table.
- Hash first calls a sub-operation to create a hash table keyed by the join condition
- Hash Join then runs a second sub-operation and filters its results through the hash table
Memory Usage
is peak usedBatches
will be 1 if all in memory, > 1 means disk usage was required.
- Merge Join - Used if joined datasets can be or are already sorted using the join key. Always has two sub-operations Sort and Materialize.
- Nested Loop Join - A join. Always has two sub-operations: first node produces a list of rows; then for every row that returns, it runs the second operation.
- Nested Loop Semi Join - Where inner result set data is not returned, only used to filter. Most often seen when using EXISTS.
- HashAggregate - aggregate rows by a grouping key then digest them into one row per group by
performing AVG, SUM, COUNT etc. Uses
work_mem
to allocate a Hash table. - Unique - uses a HashAggregate if the data is unsorted. If the data is sorted can be perform
a memory-cheap
if last rows was the same
check. - GroupAggregate - if the data is sorted skips using a hash map and uses the same check-last-row algorithm Unique uses.
- Sort - sorting, either in-memory if
work_mem
is big enough or on disk - Limit - return just the first N rows. Usually stopping sub-operations asap.
- Materialize - essentially runs an operation then stores its rows in memory for repeat processing.
- CTE - execution of a CTE
- CTE Scan - like Materialize, but this time with results returned by a CTE.
- Append - will be seen with queries that do something like
UNION ALL
where no duplicate elimation is happening, and we are simply aggregating the results of several sub-operations. - Gather - gathers results from a parallel query execution. Will have exactly one child node which is the node that will be executed in parallel. Reads tuples from workers in whatever order they appear, destroying sort order.
- Gather Merge - same as Gather but used when the child node will return tuples in sorted order, so the leader will perform an order preserving merge.
- https://explain.dalibo.com/ - a handy tool for analysing EXPLAIN output
- PostgreSQL: Documentation: 13: 14.1. Using EXPLAIN
- PostgreSQL: Documentation: 13: 50.5. Planner/Optimizer
- PostgreSQL: Documentation: 13: 70. How the Planner Uses Statistics
- PostgreSQL: Documentation: 13: EXPLAIN
- Reading a Postgres EXPLAIN ANALYZE Query Plan
- static void ExplainNode(...)
- Explaining the unexplainable – part 2 – select * from depesz; - Seq Scan, Index Scan etc.
- Explaining the unexplainable – part 3 – select * from depesz; - Hash Join, Hash, HashAggregate, Nested Loop, Merge Join etc.
- Explaining the unexplainable – part 4 – select * from depesz; - Append, Unique, Result etc.
- PostgreSQL: Documentation: 12: 15.1. How Parallel Query Works
- PostgreSQL: Documentation: 12: 15.3. Parallel Plans