Deploy Rollback Policy: The 30-Second Test
If your rollback takes longer than 30 seconds, your incidents are larger than they need to be. The fix is mechanical.
Why 30 seconds
Rollback time bounds your worst-case incident duration. A 30-second rollback caps incidents at minutes; a 30-minute rollback bounds them at hours.
Most teams have rollback tested once a year. The number is always larger than expected.
Four properties of fast rollback
- 1. Single command. Memorable; not multi-step.
- 2. No human judgment in the rollback step.
- 3. Idempotent. Safe to re-run.
- 4. Verifiable. Confirms recovery automatically.
Rehearsal cadence
Quarterly: real rollback in staging from a representative state. Time it.
Annual: real rollback in production during low-traffic window.
Policy that prevents drift
Policy: every deploy that lands without a defined rollback path requires explicit approval.
Slow rollbacks signal complex deploys; address upstream, not in the rollback path.
Antipatterns
- Manual rollback steps. Slow; error-prone at 3am.
- Untested rollback. Discovered broken on incident day.
- Rollback that requires a senior engineer. Knowledge concentration risk.
What to do this week
Three moves. (1) Apply this to one pipeline first. (2) Measure deploy frequency / MTTR before/after. (3) Document the outcome so the next team starts from data.