Skip to content

Instantly share code, notes, and snippets.

@blambeau
Last active August 29, 2015 14:11
Show Gist options
  • Save blambeau/03e9c8239ce9818a45a8 to your computer and use it in GitHub Desktop.
Save blambeau/03e9c8239ce9818a45a8 to your computer and use it in GitHub Desktop.
PostgreSQL and CTEs

Why doesn't PostgreSQL optimize Common Table Expressions?

PostgreSQL does not seem to optimize common table expressions (CTE) very well. At the same time, is seems to optimize properly when views are involved. Any idea why? Is that accidental or is it a technical/theoretical/essential reason? Examples below.

With CTE, restriction after join

Does not seem optimized.

sap=# explain with j as (select * from suppliers natural join shipments) select * from j where status > 20;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 CTE Scan on j  (cost=80.85..119.10 rows=567 width=70)
   Filter: (status > 20)
   CTE j
     ->  Hash Join  (cost=30.48..80.85 rows=1700 width=70)
           Hash Cond: ((shipments.sid)::text = (suppliers.sid)::text)
           ->  Seq Scan on shipments  (cost=0.00..27.00 rows=1700 width=18)
           ->  Hash  (cost=19.10..19.10 rows=910 width=59)
                 ->  Seq Scan on suppliers  (cost=0.00..19.10 rows=910 width=59)
(8 rows)

Manually inlined

Seems optimized.

sap=# explain select * from suppliers natural join shipments where status > 20;
                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Join  (cost=25.16..64.20 rows=566 width=70)
   Hash Cond: ((shipments.sid)::text = (suppliers.sid)::text)
   ->  Seq Scan on shipments  (cost=0.00..27.00 rows=1700 width=18)
   ->  Hash  (cost=21.38..21.38 rows=303 width=59)
         ->  Seq Scan on suppliers  (cost=0.00..21.38 rows=303 width=59)
               Filter: (status > 20)
(6 rows)

With a view

Seems optimized.

sap=# create view j as select * from suppliers natural join shipments;
CREATE VIEW
sap=# select * from j where status > 20;
 sid | name  | status | city  | pid | qty
-----+-------+--------+-------+-----+-----
 S3  | Blake |     30 | Paris | P2  | 200
(1 row)

sap=# explain select * from j where status > 20;
                               QUERY PLAN
-------------------------------------------------------------------------
 Hash Join  (cost=25.16..64.20 rows=566 width=70)
   Hash Cond: ((shipments.sid)::text = (suppliers.sid)::text)
   ->  Seq Scan on shipments  (cost=0.00..27.00 rows=1700 width=18)
   ->  Hash  (cost=21.38..21.38 rows=303 width=59)
         ->  Seq Scan on suppliers  (cost=0.00..21.38 rows=303 width=59)
               Filter: (status > 20)
(6 rows)
@ejrh
Copy link

ejrh commented Dec 10, 2014

What about this one? explain select * from (select * from suppliers natural join shipments) as j where status > 20;

You should also try EXPLAIN ANALYSE on the queries. I think what's happening is that the CTE can't be turned into a join, because CTEs have to run only once before the main query, and PG doesn't do the analysis to vertify that that would still be the case here. It's also stopping the WHERE clause being pushed into the CTE, for similar reasons. The fact that the filter isn't applied immediately to the suppliers table is likely the biggest performance difference.

@arthurprs
Copy link

This can be a good thing sometimes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment