Created
April 21, 2026 19:55
-
-
Save possebon/481f8148a3ea4fa6cfbace09bb12ec68 to your computer and use it in GitHub Desktop.
PostgreSQL: find what's blocking autovacuum — idle-in-transaction backends + replication slots pinning xmin
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| -- Find what's blocking autovacuum across a PostgreSQL cluster. | |
| -- | |
| -- Autovacuum can only reclaim dead tuples older than the oldest running | |
| -- transaction *anywhere in the cluster*. One old idle-in-transaction session, | |
| -- or one inactive replication slot holding back xmin, can silently stall | |
| -- reclaim across dozens of tables. | |
| -- | |
| -- This query returns both culprit classes in one result set: | |
| -- kind = 'backend' → a client-backend transaction older than 10 minutes | |
| -- kind = 'replication_slot' → a slot pinning xmin or catalog_xmin | |
| -- | |
| -- Tested on PostgreSQL 12+. Column names stable back to 11. | |
| -- Blog post: https://www.linkedin.com/in/fernando-possebon/ | |
| SELECT 'backend' AS kind, | |
| pid::text AS identifier, | |
| state, | |
| usename, | |
| datname, | |
| EXTRACT(EPOCH FROM (now() - xact_start))::int AS xact_age_seconds, | |
| query | |
| FROM pg_stat_activity | |
| WHERE xact_start IS NOT NULL | |
| AND (now() - xact_start) > interval '10 minutes' | |
| AND backend_type = 'client backend' | |
| UNION ALL | |
| SELECT 'replication_slot' AS kind, | |
| slot_name, | |
| CASE WHEN active THEN 'active' ELSE 'inactive' END, | |
| NULL, | |
| database, | |
| NULL, | |
| slot_type | |
| FROM pg_replication_slots | |
| WHERE xmin IS NOT NULL OR catalog_xmin IS NOT NULL | |
| ORDER BY xact_age_seconds DESC NULLS LAST | |
| LIMIT 50; | |
| -- Three things worth knowing: | |
| -- | |
| -- backend_type = 'client backend' | |
| -- Filters out walsender, autovacuum workers, and background workers. | |
| -- You want the application sessions, not Postgres's own plumbing. | |
| -- | |
| -- pg_replication_slots.xmin = transactional slot (physical replication) | |
| -- pg_replication_slots.catalog_xmin = logical slot (logical decoding) | |
| -- Both can hold back the horizon. A logical slot whose consumer died | |
| -- weeks ago is the #1 silent cause of stalled vacuum I have seen. | |
| -- | |
| -- pg_prepared_xacts can also block (2PC transactions left uncommitted). | |
| -- Rare but devastating. Add a third UNION if your cluster uses 2PC. | |
| -- How to act on the results: | |
| -- | |
| -- Idle client backend: | |
| -- SELECT pg_terminate_backend(<pid>); | |
| -- Then teach the application to close transactions. Shortest-lived | |
| -- connection is almost never the problem; idle-in-transaction is. | |
| -- | |
| -- Forgotten replication slot (consumer truly gone): | |
| -- SELECT pg_drop_replication_slot('<slot_name>'); | |
| -- Check twice. Dropping an in-use slot loses replication state. | |
| -- | |
| -- Prepared xact: | |
| -- ROLLBACK PREPARED '<gid>'; | |
| -- Then find out why your app is using 2PC and whether it should be. | |
| -- | |
| -- Preventive alert (put this in your monitoring, before you need it): | |
| -- Page when ANY client-backend xact_age > 15 minutes. Full stop. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment