PostgreSQL does not seem to optimize common table expressions (CTE) very well. At the same time, is seems to optimize properly when views are involved. Any idea why? Is that accidental or is it a technical/theoretical/essential reason? Examples below.
Does not seem optimized.
sap=# explain with j as (select * from suppliers natural join shipments) select * from j where status > 20;
QUERY PLAN
---------------------------------------------------------------------------------
CTE Scan on j (cost=80.85..119.10 rows=567 width=70)
Filter: (status > 20)
CTE j
-> Hash Join (cost=30.48..80.85 rows=1700 width=70)
Hash Cond: ((shipments.sid)::text = (suppliers.sid)::text)
-> Seq Scan on shipments (cost=0.00..27.00 rows=1700 width=18)
-> Hash (cost=19.10..19.10 rows=910 width=59)
-> Seq Scan on suppliers (cost=0.00..19.10 rows=910 width=59)
(8 rows)
Seems optimized.
sap=# explain select * from suppliers natural join shipments where status > 20;
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=25.16..64.20 rows=566 width=70)
Hash Cond: ((shipments.sid)::text = (suppliers.sid)::text)
-> Seq Scan on shipments (cost=0.00..27.00 rows=1700 width=18)
-> Hash (cost=21.38..21.38 rows=303 width=59)
-> Seq Scan on suppliers (cost=0.00..21.38 rows=303 width=59)
Filter: (status > 20)
(6 rows)
Seems optimized.
sap=# create view j as select * from suppliers natural join shipments;
CREATE VIEW
sap=# select * from j where status > 20;
sid | name | status | city | pid | qty
-----+-------+--------+-------+-----+-----
S3 | Blake | 30 | Paris | P2 | 200
(1 row)
sap=# explain select * from j where status > 20;
QUERY PLAN
-------------------------------------------------------------------------
Hash Join (cost=25.16..64.20 rows=566 width=70)
Hash Cond: ((shipments.sid)::text = (suppliers.sid)::text)
-> Seq Scan on shipments (cost=0.00..27.00 rows=1700 width=18)
-> Hash (cost=21.38..21.38 rows=303 width=59)
-> Seq Scan on suppliers (cost=0.00..21.38 rows=303 width=59)
Filter: (status > 20)
(6 rows)
What about this one? explain select * from (select * from suppliers natural join shipments) as j where status > 20;
You should also try EXPLAIN ANALYSE on the queries. I think what's happening is that the CTE can't be turned into a join, because CTEs have to run only once before the main query, and PG doesn't do the analysis to vertify that that would still be the case here. It's also stopping the WHERE clause being pushed into the CTE, for similar reasons. The fact that the filter isn't applied immediately to the suppliers table is likely the biggest performance difference.