Tested two potential optimizations based on a comment on the TypeGraph post in /r/postgresql from u/Deep_Ad1959:
- Native
CYCLEclause vs the manualARRAY[]+!= ALL(path)cycle detection that TypeGraph currently emits (packages/typegraph/src/query/dialect/postgres.ts:178-192). - Prepared-statement plan caching can pick a bad generic plan after
the 5th execution under
plan_cache_mode = autowhen start-node selectivity varies widely.
- Postgres: PG18 in a dedicated container (
pgvector/pgvector:pg18). - Graph: 10,000 nodes (
Person), ~21,500knowsedges with a heavy-tailed degree distribution:- 5 hub nodes with 500 outgoing edges each
- 100 mid nodes with 30 outgoing edges each
- ~9,895 leaves with 0-3 outgoing edges
- +500 reciprocal pairs to seed cycles
- Schema: matches what
createPostgresTablesproduces — sametypegraph_nodes,typegraph_edges, andtypegraph_edges_from_idxshape the dialect emitter relies on. - Traversal depth: 5 hops, cycle prevention on (the default).
- Iterations: 2 warmup + 10 samples per (start-node, variant) pair.
- Bench file:
packages/benchmarks/src/recursive-pg-bench.ts.
WITH RECURSIVE reachable AS (
SELECT n.id, n.kind, 0 AS depth, ARRAY[n.id]::text[] AS path
FROM typegraph_nodes n
WHERE n.graph_id = $1 AND n.id = $2 AND n.deleted_at IS NULL
UNION ALL
SELECT n.id, n.kind, r.depth + 1 AS depth, (r.path || n.id)::text[]
FROM reachable r
JOIN typegraph_edges e
ON e.from_id = r.id
AND e.graph_id = $1
AND e.kind = 'knows'
AND e.deleted_at IS NULL
JOIN typegraph_nodes n
ON n.graph_id = e.graph_id
AND n.id = e.to_id
AND n.kind = e.to_kind
AND n.deleted_at IS NULL
WHERE r.depth < $3 AND n.id != ALL(r.path)
)
SELECT id, depth FROM reachable;WITH RECURSIVE reachable AS (
SELECT n.id, n.kind, 0 AS depth
FROM typegraph_nodes n
WHERE n.graph_id = $1 AND n.id = $2 AND n.deleted_at IS NULL
UNION ALL
SELECT n.id, n.kind, r.depth + 1 AS depth
FROM reachable r
JOIN typegraph_edges e
ON e.from_id = r.id
AND e.graph_id = $1
AND e.kind = 'knows'
AND e.deleted_at IS NULL
JOIN typegraph_nodes n
ON n.graph_id = e.graph_id
AND n.id = e.to_id
AND n.kind = e.to_kind
AND n.deleted_at IS NULL
WHERE r.depth < $3
) CYCLE id SET is_cycle USING path
SELECT id, depth FROM reachable WHERE NOT is_cycle;| Start | Rows reached | manual-array p50 | native-cycle p50 | Δ p50 |
|---|---|---|---|---|
| hub 0 | 14,677 | 61.8 ms | 65.3 ms | +5.7% |
| hub 1 | 16,952 | 69.4 ms | 73.9 ms | +6.5% |
| hub 2 | 17,355 | 73.0 ms | 77.0 ms | +5.4% |
| mid 0 | 1,591 | 7.2 ms | 7.8 ms | +7.6% |
| mid 1 | 758 | 3.7 ms | 3.9 ms | +4.3% |
| mid 2 | 522 | 3.2 ms | 2.9 ms | -7.5% |
| leaf | 5-77 | 0.4-0.8 ms | 0.4-0.8 ms | within noise |
Native CYCLE is consistently slower (~5-8%) on the queries large
enough to matter. Within noise on small queries. There is no scenario in
this bench where it beats the manual array pattern.
Why: PG's CYCLE is syntactic sugar over the same text[] tracking. It
does not switch detection to a hash or bitmap. The extra work appears
to come from the implicit cycle-marker column maintenance plus the
NOT is_cycle filter on output. The path-elision win the commenter
posited never materializes because the recursion still needs id and
depth columns in the output projection.
Same SQL as Variant A above. Each variant runs 2 warmup + 20 alternating executions (hub, leaf, hub, leaf, …) on a dedicated checked-out client.
- unnamed: no
namefield passed topg.Client#query— driver issues unnamed parse-execute each time. - named-auto: named prepared statement, default
plan_cache_mode(auto). PG decides custom-vs-generic after 5 executions. - named-force-custom: named prepared statement,
SET plan_cache_mode = 'force_custom_plan'. - named-force-generic: named prepared statement,
SET plan_cache_mode = 'force_generic_plan'.
| Variant | hub p50 | hub p95 | leaf p50 | leaf p95 | pg_prepared_statements |
|---|---|---|---|---|---|
| unnamed | 68.8 ms | 79.1 ms | 0.96 ms | 1.78 ms | (n/a) |
| named-auto | 68.0 ms | 72.3 ms | 0.77 ms | 1.12 ms | generic=17, custom=5 |
| named-force-custom | 68.9 ms | 87.8 ms | 0.81 ms | 1.31 ms | generic=0, custom=22 |
| named-force-generic | 69.2 ms | 73.8 ms | 0.77 ms | 0.98 ms | generic=22, custom=0 |
The plan-mode flip is real. pg_prepared_statements confirms that under
default auto mode, PG used a custom plan for the first 5 executions
(including warmups) and then switched to generic for the remaining 17.
This matches the documented behavior the Reddit commenter cited.
But across all four variants, latency is statistically identical for
the same workload class. Forcing custom doesn't help; forcing generic
doesn't hurt; using the default auto (where the flip happens) is
indistinguishable from unnamed parse-execute.
Why this benchmark didn't expose pain even though the planner flipped:
- The query's cost is dominated by the recursive worktable join, not by the initial start-node lookup.
- Both
typegraph_edges_from_idxand thetypegraph_nodesprimary key serve any start-node selectivity well enough that there's no hash-vs-nested-loop knife-edge to fall off when the planner uses averaged statistics.
This is a meaningful negative result. It does not disprove the risk
in general — a different query shape with a planner knife-edge at the
start-node lookup could still get burned by auto. But for the
recursive traversals TypeGraph emits today, prepared statements with
auto plan cache are fine.