CI/CD & GitOps Practical By Samson Tanimawo, PhD Published Jan 14, 2026 4 min read

Deploy Rollback Discipline

Rollback should be 1 command.

Speed targets

Rollback complete in under 60 seconds for application deploys. Anything slower means extended customer impact.

Rollback complete in under 5 minutes for infrastructure changes. Database migrations, config changes are slower but bounded.

Untested rollback is theatre. The first rollback should not happen in production.

Automation patterns

One-command rollback. kubectl rollout undo, terraform plan -target with previous state, vendor-specific rollback APIs.

Auto-rollback on metric breach. Argo Rollouts, Flagger, vendor canary tools support this. Catches regressions before humans see them.

Human override always available. Auto-rollback can be wrong; humans need a fast escape valve.

Testing rollback

Quarterly rollback drills. Pick a recent deploy; roll it back in non-prod; verify the procedure works.

Document the procedure. Step by step; copy-pasteable; tested.

Rollback failure is itself an incident. If a rollback fails in production, postmortem and fix.

Constraints and trade-offs

Database migrations limit rollback. Forward-compatible schema changes (add column, then read-write, then enforce, then remove old) preserve rollback ability.

External API contracts limit rollback. If you ship a breaking API change, downstream cannot return to the old contract.

Stateful services may not be cleanly rollbackable. Plan accordingly; sometimes roll-forward is the only option.

Operating rollback discipline

Rollback metric: time-to-rollback per deploy. Track P95.

Rollback rate: deploys that ended in rollback per week. Healthy: low and predictable. Unhealthy: trending up.

Per-rollback postmortem when caused by regression. Compounding: each rollback informs deploy hardening.