Vacuum Tuning in Postgres: A Deep Dive
Postgres vacuum is the source of countless ‘our database got slow’ incidents. The tuning is mechanical when you understand the four settings.
Why vacuum matters
Postgres uses MVCC, which means dead tuples accumulate on every UPDATE and DELETE. Vacuum reclaims them; skip it and tables bloat, queries slow, the database falls behind.
- MVCC consequence. Dead tuples accumulate; vacuum reclaims them; without it, tables grow indefinitely.
- Bloat slows queries. The planner reads more pages to find live rows; latency climbs slowly.
- Default conservative. Autovacuum's defaults are tuned for small tables; large or active tables need explicit tuning.
- Wraparound risk. Failure to vacuum eventually triggers transaction-ID wraparound, which forces an emergency vacuum.
Four key settings
- autovacuum_vacuum_scale_factor, trigger threshold.
- autovacuum_max_workers, concurrency.
- maintenance_work_mem, per-vacuum memory.
- autovacuum_vacuum_cost_delay, throttling.
Per-table overrides
Cluster-wide settings cannot serve every table well. Per-table overrides tune the hot tables without disrupting the cold ones.
- Override syntax.
ALTER TABLE x SET (autovacuum_vacuum_scale_factor = 0.05)for hot tables. - Default rationale. 0.2 means vacuum when 20% of rows are dead; tolerable for cold tables, painful for hot ones.
- Hot-table target. 0.05 vacuums earlier and keeps bloat down; the cost is more frequent vacuum runs.
- Identification.
pg_stat_user_tablesshows dead-tuple counts; the worst offenders surface immediately.
Symptoms of bad tuning
Bad vacuum tuning announces itself slowly. Three symptoms surface before incidents; catching any of them lets you tune before the page fires.
- Bloat increasing. Table size grows out of proportion to row count;
pg_stat_user_tablesshows the dead-tuple share. - Stale planner stats. Query plans drift; ANALYZE not running often enough; bad plan choices follow.
- Long autovacuum runs. Vacuum never completes between cycles; the queue grows; bloat compounds.
- Tuning signal. Each symptom is an early warning; address before they become incidents.
Antipatterns
- Default autovacuum at scale. Bloat.
- Disabling autovacuum. Catastrophe within weeks.
- Manual VACUUM as substitute. Fragile; error-prone.
What to do this week
Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.