Rollback Validation: Did It Work?

Rollback isn't done until it's verified. The validation.

Metrics

Validation starts with the metric that triggered the rollback. Confirm it returned to baseline (within minutes, not hours) before declaring resolution. Slow recovery means the rollback did not address the actual cause; declaring "resolved" on partial recovery is how incidents come back ten minutes later with the same shape.

Smoke tests

Smoke tests catch what the metric misses. The trigger metric returning to baseline does not prove the user-facing flow works; some bugs land below the alerting threshold but break checkout. Run affected user flows (synthetic before real where possible) per rollback.

Monitor

Monitoring after rollback is its own discipline. Some regressions emerge slowly via cache effects, queue drains, or downstream impact; the watch must extend past metric recovery. Named monitor owner plus temporarily tighter post-rollback alarms catches resurfacing bugs in the first 30 minutes.