Postgres High-Availability Patterns
Postgres HA in 2026 is more achievable than ever. Four patterns; pick on team and scale.
Why HA matters
A single-node Postgres is a single point of failure for your entire stack. HA is what turns 'database outage' into a brief blip rather than a multi-hour incident.
- Single-node risk. Any node failure means downtime: hardware, kernel panic, disk full, OOM kill, all stop the database.
- HA posture. A standby is ready to take writes; failover happens in seconds to minutes, not hours.
- RPO and RTO. HA reduces both: how much data you might lose and how long recovery takes.
- Cost reality. HA roughly doubles infrastructure cost; the calculus is whether your downtime cost exceeds that.
Four patterns
- 1. Streaming replication + manual failover.
- 2. Patroni-managed cluster.
- 3. Stolon, pg_auto_failover.
- 4. Managed (RDS, Cloud SQL, Aurora).
Failover behavior
Failover speed is the headline metric. The pattern you choose dictates whether failover is a runbook step or a fully automatic event.
- Streaming replication. Manual failover, 1 to 15 minutes; an engineer promotes the standby on a runbook page.
- Patroni / Stolon. Automatic, ~30 seconds; consensus layer (etcd or Consul) coordinates the leader election.
- Managed (RDS, Aurora, Cloud SQL). Automatic, ~30 seconds; the cloud provider's control plane handles it end to end.
- Connection draining. Application connection poolers must be HA-aware (PgBouncer with retry, RDS Proxy) or app sees connection errors during failover.
When managed wins
The decision between managed and self-hosted Postgres is mostly about whether you want to operate a database team.
- Sub-1TB scale. Managed is the default; the per-GB premium is rarely worth a database engineer's salary.
- Compliance constraints. Some regulated workloads cannot use managed; Patroni or self-hosted then becomes mandatory.
- Hyperscale. At multi-TB or extreme write rates, self-hosted Patroni with custom tuning beats managed defaults.
- Vendor lock-in. Aurora is fast but proprietary; RDS standard Postgres is portable; weigh exit cost up front.
Antipatterns
- Single-node forever. Outage waiting.
- Managed without backup verification. Surprise on real failover.
- Custom HA without Patroni. Reinvent the wheel badly.
What to do this week
Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.