Kubernetes StatefulSet Operations: Backups, Upgrades, and Reschedule Risk
Most StatefulSet incidents are scheduling incidents in disguise. The patterns are well-documented; the discipline is rare.
Why StatefulSets are different
Deployments treat pods as interchangeable. StatefulSets do not, each pod has stable identity, stable network DNS, and persistent storage. That stability is what lets you run Postgres or Kafka in the cluster, but it also means scheduling decisions matter much more.
Most StatefulSet outages share a root cause: the cluster decided to reschedule a pod, and the new pod could not get back to its persistent volume in time. Either the PVC was on a node-local disk, the AZ flipped, or the storage class did not support cross-node attach.
Backups that survive a delete
- Volume snapshots are the table-stakes backup. Set them up before you need them; test restore once a quarter or assume they do not work.
- For databases, application-level backups (pg_dump, Cassandra nodetool snapshot) are still required. Volume snapshots capture state-in-flight; logical backups capture state-at-checkpoint. You need both.
- The hard test: rehearse the full delete + restore on a non-prod cluster. Most teams discover their backup process is incomplete only after the real incident. Rehearsal is the only way around this.
Upgrades without surprise reschedules
updateStrategy: RollingUpdate with a sensible partition value lets you upgrade one replica at a time and gate progression on health. Without partitioning, a bad image rolls to all replicas and your database is down.
Pod disruption budgets at the StatefulSet level prevent voluntary disruptions from taking too many replicas at once. Set minAvailable to (N-1) for a database; the cluster will not drain a node if it would breach.
PVC handling on rollout
- PVCs persist when pods are deleted by default.
persistentVolumeClaimRetentionPolicycontrols what happens on StatefulSet scale-down or delete, setwhenScaled: Retainfor databases,whenDeleted: Retainfor one-shot data you want to keep. - Watch for the storage class default. EBS gp2 is single-AZ by default; if your StatefulSet pod reschedules to another AZ, the volume cannot follow. Either use a multi-AZ-aware storage class or pin the StatefulSet to one AZ.
Antipatterns
- Treating StatefulSet as a Deployment. The semantics differ. The operations differ.
- No backup rehearsal. Untested backups are a single point of failure with extra steps.
- Local-only storage in production. Convenient at install; brittle by week three.
What to do this week
Three moves. (1) Identify every StatefulSet in your cluster; verify each has a tested backup pipeline. (2) Add PDBs to each with minAvailable equal to (N-1). (3) Schedule one StatefulSet restore rehearsal for next sprint.