This document demonstrates various PostgreSQL query plan types, with sample datasets, queries, and EXPLAIN/EXPLAIN ANALYZE outputs.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
age INT,
is_active BOOLEAN
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INT,
total INT,
created_at DATE,
FOREIGN KEY (user_id) REFERENCES users(id)
);
INSERT INTO users (name, age, is_active) VALUES
('Alice', 30, true),
('Bob', 45, false),
('Charlie', 25, true),
('Diana', 35, true),
('Eve', 40, false);
INSERT INTO orders (user_id, total, created_at) VALUES
(1, 100, '2024-01-01'),
(1, 200, '2024-01-15'),
(2, 50, '2024-02-01'),
(3, 150, '2024-03-01'),
(4, 300, '2024-03-15'),
(4, 80, '2024-04-01');
EXPLAIN SELECT * FROM users WHERE age > 30;
Output:
Seq Scan on users (cost=0.00..1.05 rows=2 width=32)
Filter: (age > 30)
CREATE INDEX idx_users_age ON users(age);
EXPLAIN SELECT * FROM users WHERE age > 30;
Output:
Index Scan using idx_users_age on users (cost=0.14..8.17 rows=2 width=32)
Index Cond: (age > 30)
EXPLAIN SELECT age FROM users WHERE age > 30;
Output:
Index Only Scan using idx_users_age on users (cost=0.14..8.17 rows=2 width=4)
Index Cond: (age > 30)
EXPLAIN SELECT u.name, o.total
FROM users u JOIN orders o ON u.id = o.user_id;
Output:
Hash Join (cost=1.15..2.50 rows=6 width=48)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders
-> Hash (cost=1.05..1.05 rows=5 width=32)
-> Seq Scan on users
EXPLAIN SELECT user_id, SUM(total) FROM orders GROUP BY user_id;
Output:
HashAggregate (cost=1.15..1.50 rows=3 width=12)
Group Key: user_id
-> Seq Scan on orders
EXPLAIN SELECT * FROM orders ORDER BY total DESC LIMIT 2;
Output (without index):
Limit
-> Sort
Sort Key: total DESC
-> Seq Scan on orders
Optimized:
CREATE INDEX idx_orders_total_desc ON orders(total DESC);
EXPLAIN SELECT * FROM orders ORDER BY total DESC LIMIT 2;
Optimized Output:
Limit
-> Index Scan using idx_orders_total_desc on orders
EXPLAIN SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE total > 150);
Output:
Hash Semi Join
Hash Cond: (users.id = orders.user_id)
EXPLAIN SELECT name FROM users WHERE EXISTS (
SELECT 1 FROM orders WHERE orders.user_id = users.id AND total > 150
);
Output:
Nested Loop Semi Join
-> Seq Scan on users
-> Index Scan using orders_user_id_idx on orders
WITH RECURSIVE hierarchy AS (
SELECT id, name, manager_id, 1 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
SELECT e.id, e.name, e.manager_id, h.level + 1
FROM employees e
JOIN hierarchy h ON e.manager_id = h.id
)
SELECT * FROM hierarchy;
Output includes:
Recursive Union, WorkTable Scan, Nested Loop
SELECT u.name, o.id, o.total
FROM users u,
LATERAL (
SELECT * FROM orders WHERE orders.user_id = u.id ORDER BY total DESC LIMIT 1
) o;
Output:
Nested Loop
-> Seq Scan on users
-> Limit
-> Index Scan on orders
CREATE TABLE products (
id SERIAL PRIMARY KEY,
data JSONB
);
INSERT INTO products (data) VALUES
('{"category": "electronics", "price": 100}'),
('{"category": "clothing", "price": 40}');
CREATE INDEX idx_jsonb_category ON products USING gin ((data->>'category'));
EXPLAIN SELECT * FROM products WHERE data->>'category' = 'electronics';
Output:
Bitmap Heap Scan -> Bitmap Index Scan on idx_jsonb_category
CREATE TABLE orders (
id SERIAL,
user_id INT,
total INT,
created_at DATE
) PARTITION BY RANGE (created_at);
CREATE TABLE orders_2024 PARTITION OF orders FOR VALUES FROM ('2024-01-01') TO ('2024-12-31');
CREATE TABLE orders_2025 PARTITION OF orders FOR VALUES FROM ('2025-01-01') TO ('2025-12-31');
EXPLAIN SELECT * FROM orders WHERE created_at = '2025-06-01';
Output:
Append -> Seq Scan on orders_2025
EXPLAIN ANALYZE SELECT * FROM users WHERE age > 30;
Sample Output:
Seq Scan on users (actual time=0.012..0.018 rows=2 loops=1)
Filter: (age > 30)
Rows Removed by Filter: 3
Planning Time: 0.040 ms
Execution Time: 0.030 ms
This section dives deeper into performance, tuning, and advanced plan strategies.
Materialized views store query results physically and are great for expensive aggregations.
CREATE MATERIALIZED VIEW user_order_summary AS
SELECT user_id, COUNT(*) AS total_orders, SUM(total) AS total_spent
FROM orders
GROUP BY user_id;
-- Refresh view when data changes
REFRESH MATERIALIZED VIEW user_order_summary;
EXPLAIN SELECT * FROM user_order_summary;
Output:
Seq Scan on user_order_summary
EXPLAIN SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.user_id;
Hash Join
Hash Cond: (o.user_id = u.id)
Used for medium-sized joins when thereβs no useful index.
EXPLAIN SELECT * FROM users u JOIN orders o ON u.id = o.user_id ORDER BY u.id, o.user_id;
Merge Join
Merge Cond: (u.id = o.user_id)
Efficient when both inputs are sorted.
EXPLAIN SELECT * FROM users WHERE EXISTS (
SELECT 1 FROM orders WHERE orders.user_id = users.id AND total > 200
);
Nested Loop
-> Seq Scan on users
-> Index Scan on orders
Best for small inner loop or when using indexes.
PostgreSQL caches prepared statement plans. Consider:
- Generic Plans: Reused across executions.
- Custom Plans: Recompiled for each execution.
Check with:
EXPLAIN (ANALYZE, VERBOSE) EXECUTE prepared_stmt;
To force custom plan:
SET plan_cache_mode = force_custom_plan;
Use these to help the planner make accurate decisions:
ANALYZE users;
Refreshes statistics used by the planner.
VACUUM ANALYZE orders;
Cleans up dead tuples and updates stats.
Initialize a test database:
pgbench -i -s 10 mydb
Run performance benchmark:
pgbench -c 10 -j 2 -T 60 mydb
-c
: number of clients-j
: threads-T
: duration in seconds
Add custom SQL test:
pgbench -f custom_script.sql -T 60 mydb
- cost=START..END: estimated planner cost to start and finish
- rows: estimated number of rows
- width: average row size (in bytes)
- actual time: runtime details in ANALYZE
- Use
EXPLAIN (ANALYZE, BUFFERS)
for deeper analysis - Use
pg_stat_statements
andauto_explain
for production logging
- Always test plans with
EXPLAIN ANALYZE
- Monitor slow queries via
pg_stat_statements
- Avoid unnecessary nested loops on large tables
- Tune
work_mem
,random_page_cost
, andeffective_cache_size
for performance
Happy Tuning! π―π₯