Database Backup Strategy: The 3-2-1 Rule for 2026
Backups are useless without verification. The 3-2-1 rule is mechanical; the verification is the discipline.
Why 3-2-1 matters
The 3-2-1 rule is the floor for database durability: 3 copies, 2 different media, 1 offsite. Anything less is a single failure away from data loss.
- One backup. Single point of failure; one bad disk or one bad operator and the data is gone.
- Three copies. Two on-site, one offsite; survives single-disk failure, host failure, and site failure.
- Two media types. Different media to survive correlated failures (e.g. logical dump plus snapshot).
- One offsite. Different region or different cloud; survives a regional outage or account compromise.
Four-component implementation
- 1. Logical dumps (pg_dump, mysqldump).
- 2. Physical snapshots (volume-level).
- 3. Continuous WAL/binlog archive.
- 4. Offsite copies (different region/cloud).
Verification cadence
An untested backup is a guess. Quarterly restore rehearsals turn the backup from a hope into a tool with a measured RTO.
- Quarterly drill. Full restore in a separate environment; not on the production DB; not from an outdated runbook.
- Real surprise budget. First real failure-day restore is always longer than expected; rehearse to absorb the surprise.
- Different team member. Rotate who runs the drill; the restore should not depend on one engineer's tribal knowledge.
- Document the wall-clock. Record actual elapsed time; this is the RTO, not the marketing number.
Restore-time SLO
Without an explicit restore-time SLO, recovery becomes a panicked guess during incidents. The SLO turns it into a planned operation.
- Set the SLO. Explicit numbers: 4 hours for full DB restore, 30 minutes for point-in-time recovery.
- Track attainment. Quarterly drill produces a measurement; trend it like any other reliability metric.
- Alert on missed drills. If a quarter passes without a rehearsal, page the team lead; do not let the discipline drift.
- Stakeholder visibility. Share the SLO and attainment with product and finance; recovery time has business consequences.
Antipatterns
- Backup without restore test. Untested = nonexistent.
- One copy. Single-point.
- Same region as primary. Region outage = lost both.
What to do this week
Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.