The Snapshot Frequency Matrix for Recovery

Snapshot frequency drives RPO. The matrix that picks the right cadence per workload class.

RPO drives the cadence

Recovery Point Objective is the maximum acceptable data loss per service. Snapshot frequency directly bounds RPO: daily snapshots mean RPO of 24 hours; hourly means 1 hour; per-15-minutes means 15 minutes plus reconstruction. Below 15-minute snapshots, continuous replication is usually cheaper and tighter.

The matrix by criticality

Snapshot strategy maps to criticality tier. Critical (financial transactions, user data): continuous replication plus daily snapshots for archive, RPO seconds. Important (production databases, core configs): hourly snapshots plus continuous replication, RPO 1 hour. Standard: daily snapshots, RPO 24 hours. Low-criticality: daily or weekly with recovery from source.

Retention policy per tier

Retention scales with tier. Critical: 30 days hot, 90 days warm, 7 years cold for compliance, cost real but justified. Important: 14 days hot, 30 days warm, covers most recovery scenarios. Standard: 7 days hot, 30 days warm. Low-criticality: 7 days, cheap to keep and cheap to lose.

Test recovery, not just snapshots

Snapshot existence is not recoverability. Quarterly: pick a snapshot, restore to clean environment, verify integrity, time the recovery because RTO compounds with RPO (1-hour RPO plus 8-hour RTO leaves 9 hours of customer impact). Document the procedure because untested procedures fail under pressure.

Cost considerations

Snapshot cost scales with frequency, retention, and data size. A 1TB database with hourly snapshots and 30-day retention is 720 snapshots, real money. Incremental snapshots help because most cloud providers store only changed blocks; cross-region replicated snapshots double the cost but provide region-failure protection.