Postgres High-Availability Patterns

Postgres HA in 2026 is more achievable than ever. Four patterns; pick on team and scale.

Why HA matters

A single-node Postgres is a single point of failure for your entire stack. HA is what turns 'database outage' into a brief blip rather than a multi-hour incident.

Single-node risk. Any node failure means downtime: hardware, kernel panic, disk full, OOM kill, all stop the database.
HA posture. A standby is ready to take writes; failover happens in seconds to minutes, not hours.
RPO and RTO. HA reduces both: how much data you might lose and how long recovery takes.
Cost reality. HA roughly doubles infrastructure cost; the calculus is whether your downtime cost exceeds that.

Four patterns

1. Streaming replication + manual failover.
2. Patroni-managed cluster.
3. Stolon, pg_auto_failover.
4. Managed (RDS, Cloud SQL, Aurora).

Failover behavior

Failover speed is the headline metric. The pattern you choose dictates whether failover is a runbook step or a fully automatic event.

Streaming replication. Manual failover, 1 to 15 minutes; an engineer promotes the standby on a runbook page.
Patroni / Stolon. Automatic, ~30 seconds; consensus layer (etcd or Consul) coordinates the leader election.
Managed (RDS, Aurora, Cloud SQL). Automatic, ~30 seconds; the cloud provider's control plane handles it end to end.
Connection draining. Application connection poolers must be HA-aware (PgBouncer with retry, RDS Proxy) or app sees connection errors during failover.

When managed wins

The decision between managed and self-hosted Postgres is mostly about whether you want to operate a database team.

Sub-1TB scale. Managed is the default; the per-GB premium is rarely worth a database engineer's salary.
Compliance constraints. Some regulated workloads cannot use managed; Patroni or self-hosted then becomes mandatory.
Hyperscale. At multi-TB or extreme write rates, self-hosted Patroni with custom tuning beats managed defaults.
Vendor lock-in. Aurora is fast but proprietary; RDS standard Postgres is portable; weigh exit cost up front.

Antipatterns

Single-node forever. Outage waiting.
Managed without backup verification. Surprise on real failover.
Custom HA without Patroni. Reinvent the wheel badly.

What to do this week

Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.