A PL/pgSQL function that reports installed extensions across all PostgreSQL databases from a single connection, by parsing raw heap files instead of opening a connection to each database.
pg_extension is a per-database catalog. There is no cross-database view of
installed extensions in PostgreSQL. The standard way to collect this data in
postgres_exporter
is --auto-discover-databases, which opens a new connection to every database
on every scrape.
At small scale this is fine. At hundreds or thousands of databases it becomes a real problem:
- Each scrape opens and closes N connections, each with authentication, process fork, and catalog lookup overhead
- At 1000 databases on a 60-second scrape interval, you're hammering the postmaster with connection churn continuously
max_connectionshas to be sized for burst, not steady state- PgBouncer helps but adds operational complexity and doesn't eliminate the per-database-per-scrape connection entirely
The goal: collect (datname, extname) rows for every installed extension
across every database, issued as a single query from a single connection.
pg_extension has a fixed catalog OID of 3079 in every PostgreSQL
installation. Its heap file therefore always lives at:
$PGDATA/base/<db_oid>/3079
From a single connection to the postgres database, we use
pg_read_binary_file() to read each database's file, then parse the heap page
format directly in PL/pgSQL.
Each file is a sequence of 8192-byte heap pages. For each page:
- Read
pd_lower(bytes 12–13, little-endian uint16) to find the end of the line pointer array - For each 4-byte line pointer starting at offset 24:
- Extract
lp_off(bits 0–14) — offset to the tuple within the page - Extract
lp_flags(bits 15–16) — skip anything that isn'tLP_NORMAL(1)
- Extract
- At the tuple, read
t_infomask(bytes 20–21) and check:HEAP_XMIN_COMMITTED(0x0100) — inserting transaction committedHEAP_XMAX_INVALID(0x0800) — tuple has not been deleted
- Read
t_hoff(byte 22) — offset from tuple start to user data - In
pg_extensionuser data (PostgreSQL 12+):- bytes 0–3:
oid - bytes 4–67:
extname(Name type, 64 bytes, null-padded)
- bytes 0–3:
Extract extname, strip null bytes, yield (datname, extname).
- PostgreSQL 12+ (OID became a regular user column in PG12; pre-12 it lives in the tuple header at a different offset)
pg_read_binary_file()requires superuser or thepg_read_server_filesrole- Assumes
relfilenode == 3079forpg_extensionin every database — true unlessVACUUM FULLwas run on the catalog, which only a superuser can do and almost never happens - Reads from disk; pages modified since the last checkpoint may not be
visible. Acceptable for monitoring —
pg_extensionbarely changes and background checkpointing covers it within ~5 minutes
| File | Purpose |
|---|---|
get_all_extensions.sql |
The function itself |
bench_get_all_extensions.sql |
Benchmarking wrapper that captures /proc/self/io deltas |
Dockerfile |
PostgreSQL 17 image with sysstat/procps for benchmarking |
postgresql-bench.conf |
Tuned postgresql.conf for benchmark runs |
setup_databases.sh |
Creates N bench databases in parallel with extensions installed |
bench.sh |
Runs the benchmark at a given scale, reports latency + I/O |
run_all.sh |
Full suite: spins up Docker, runs at 100/1000/10000 databases |
-- Run against the database your exporter connects to (e.g., postgres)
psql -d postgres -f get_all_extensions.sql
-- Grant access to the exporter role
GRANT EXECUTE ON FUNCTION public.get_all_extensions() TO <exporter_role>;
GRANT pg_read_server_files TO <exporter_role>;SELECT datname, extname
FROM public.get_all_extensions()
ORDER BY datname, extname; datname | extname
--------------------+--------------------
airbyte | plpgsql
events_development | citext
events_development | plpgsql
myapp | pg_stat_statements
myapp | plpgsql
myapp | uuid-ossp
The companion query YAML (in the postgres_exporter repo at
queries/pg_extensions_installed.yaml) emits one gauge per
(datname, extname) pair:
pg_extensions_installed_installed{datname="myapp",extname="pg_stat_statements",server="..."} 1
Run with:
postgres_exporter --extend.query-path=queries/pg_extensions_installed.yaml
Requires Docker (OrbStack or Docker Desktop) and psql on the host.
# Full run: builds image, creates 100/1000/10000 databases, benchmarks each
./run_all.sh
# Keep the container running after the run (useful for further testing)
./run_all.sh --keep
# Benchmark a specific scale against an already-running container
./setup_databases.sh 1000 40 # create 1000 databases, 40 parallel workers
./bench.sh 1000 7 # 7 timed runsBenchmarked on PostgreSQL 17.9 in Docker (OrbStack, arm64 Mac),
shared_buffers=256MB, autovacuum=off, track_io_timing=on. All timed
runs after a warmup call (OS page cache warm). ~10% of databases have a second
extension (fuzzystrmatch) installed.
| Scale | min | avg | p99 | max |
|---|---|---|---|---|
| 100 DBs | 10ms | 14ms | 19ms | 21ms |
| 1,000 DBs | 21ms | 26ms | 28ms | 28ms |
| 10,000 DBs | 109ms | 152ms | 224ms | 227ms |
Latency scales roughly linearly. At 10k databases the function reads 78MB of heap files; at that point it becomes I/O-bound rather than CPU-bound.
pg_read_binary_file() bypasses the PostgreSQL buffer pool and reads directly
from the OS. I/O is measured via /proc/self/io deltas inside the postgres
backend process.
| Scale | rchar (all read() syscalls) | read_bytes (physical disk, page cache misses) | pg_extension files on disk |
|---|---|---|---|
| 100 | 0.79 MB | 0 MB (warm cache) | 824 KB |
| 1,000 | 7.82 MB | 0 MB (warm cache) | 7.9 MB |
| 10,000 | 78.1 MB | 0 MB (warm cache) | ~78 MB |
rchar scales at exactly N × 8 KB — one read() syscall per database.
read_bytes is 0 on warm-cache runs; on a cold call (post-reboot, post-PITR)
it would equal rchar.
Backend RSS is flat at ~9 MB across all scales. The function reads each file, parses it, and discards the bytea. It does not accumulate state. The container memory growth with scale is OS page cache filling with heap files.
| Scale | CPU% during call | Wall time (including psql client) |
|---|---|---|
| 100 | 79% | 0.10s |
| 1,000 | 64% | 0.09s |
| 10,000 | 34% | 0.22s |
CPU% drops as scale grows because at large N the function is I/O-bound. Absolute CPU time per call remains low.
This approach bypasses the PostgreSQL storage engine entirely. That means it bypasses all of PostgreSQL's correctness and safety guarantees. The failure mode for almost every risk below is silent wrong results with no error signal.
VACUUM FULL rewrites a table and assigns a new relfilenode. If someone runs
VACUUM FULL on pg_extension in another database (requires superuser), the
file at base/<db_oid>/3079 either doesn't exist, belongs to a different
relation, or contains unrelated data. The function either skips that database
silently or — worse — parses a completely different relation's heap pages and
returns garbage extension names. The function verifies the connected database's
relfilenode and emits a WARNING if it has drifted, but cannot check other
databases.
Practical likelihood: very low. Only a superuser can run VACUUM FULL on a
system catalog, and it would have to be targeted or a blanket cluster-wide
VACUUM FULL.
We check HEAP_XMIN_COMMITTED to confirm the inserting transaction committed.
Hint bits are written lazily — only after a backend reads the tuple and
resolves its transaction status. This means:
- Right after
pg_upgrade: hint bits may not be set on any tuple in any database. The function returns zero extensions everywhere until each database is read by a normal query. - Right after a PITR restore from backup: same.
- A brand-new database before PostgreSQL has read its own
pg_extension.
Result: zero rows returned for affected databases, silently.
On a streaming replica, pg_read_binary_file() reads from disk while WAL
replay runs concurrently and writes to the same pages. WAL replay takes locks
on buffers, not on physical files. We can read() a page mid-replay, getting
a torn page that we then parse as if it were valid. Result: corrupt or missing
extension names with no error.
If data_checksums is enabled, PostgreSQL validates page checksums when pages
enter the buffer pool. pg_read_binary_file() bypasses the buffer pool, so no
checksum validation happens. A corrupt page — torn write, bit rot, faulty
storage — is parsed as a valid heap page. Silent wrong results.
The function requires pg_read_server_files, which grants the ability to read
any file the PostgreSQL process can read — not just base/*/3079. This
includes postgresql.conf, pg_hba.conf, SSL private keys, WAL segments,
and heap files for all relations in all databases. The SECURITY DEFINER
wrapper limits what callers can do with the function body, but the owning
role's privilege is permanently broad.
Symlink attack: if an attacker has OS write access to the data directory,
they can replace base/<db_oid>/3079 with a symlink to any file
(/etc/ssl/private/server.key, etc.). The function reads the target file into
the postgres process. An attacker who understands the parser's structure
(check infomask at known offsets, extract 64 bytes at lp_off + t_hoff + 4)
could craft the symlink target to be a file whose bytes satisfy the visibility
checks, causing the function to return 64-byte chunks of that file as
"extension names" in the result set — a covert 64-bytes-at-a-time file
exfiltration channel, visible in Prometheus metrics.
Prerequisites: OS write access to the data directory + EXECUTE on the function. The data directory should already be protected from non-postgres writes, but container misconfigs, shared NFS mounts, or compromised sidecars can violate this assumption.
Tuple header field offsets and pg_extension column layout are hardcoded.
These are PostgreSQL internals with no stability guarantee:
- A future major version adding a column before
extnamewould shift our offset and produce silent garbage - We never validate the page format version (
pd_pagesize_version) — a page from a version with a different layout would be silently misread - Post-
pg_upgrade, pages written by the old version use the old format until they are rewritten by normal operations
Reading 10,000 files sequentially takes ~150ms. Between reading the first and last database, extensions can be installed, removed, or databases dropped. The result set has no consistent point-in-time. For monitoring this is usually acceptable; for compliance enforcement it is not.
PostgreSQL splits heap files into 1GB segments: 3079, 3079.1, 3079.2.
We only read 3079. A database with enough extensions to overflow a segment
(millions) would have the rest silently omitted. Not a real concern for
pg_extension specifically, but a latent correctness bug if this pattern is
adapted for larger catalogs.
Our visibility filter is a heuristic, not real MVCC:
- False negatives: a deleted extension whose
HEAP_XMAX_INVALIDhint bit hasn't been set yet (aborted delete, unresolved xmax) gets incorrectly excluded — a live extension appears missing. - False positives: a deleted extension whose xmax is a committed
transaction but whose
HEAP_XMAX_INVALIDbit hasn't been cleared yet appears as still installed.
Both windows are very short for pg_extension given its near-zero churn, but
they exist.
| Risk | Likelihood | Impact | Detectable? |
|---|---|---|---|
| Relfilenode drift (VACUUM FULL on catalog) | Very low | Silent wrong results | No |
| Hint bits unset (upgrade / restore) | Low–medium | Silent missing data | No |
| Replica WAL replay race | Medium | Corrupt results | No |
| Checksum bypass | Low–medium | Corrupt results | No |
| Symlink → file exfiltration | Low (requires data dir write access) | Security breach | No |
| pg_read_server_files scope | Depends on threat model | Security breach | N/A |
| Hardcoded layout vs future PG version | Low | Silent wrong results | No |
| Snapshot inconsistency | Certain | Stale data (by design) | By design |
| Multi-segment heap | Near zero | Silent truncation | No |
| MVCC approximation | Very low | ±1 extension, short window | No |
The theme throughout: bypassing the database engine means bypassing its correctness guarantees. The failure mode is almost always a silent wrong answer. For a read-only monitoring function over nearly-static data, this tradeoff may be acceptable — but go in with clear eyes.