The Degraded-Mode Recovery Runbook

Recovering from degraded mode is its own runbook. The steps that prevent re-degradation.

Verify root cause fixed

The first recovery step is verification, not restoration. Recovery on top of an unfixed root cause re-fails immediately and burns customer trust twice in one incident. Strict gate: do not start the recovery sequence until a named engineer confirms the cause is actually fixed.

Staged recovery

Recovery is staged, not all-at-once. Restore one feature, watch metrics for a defined window, promote the next. Staging catches the partial-failure modes that come back broken even after the underlying cause is fixed.

Comms

Comms during recovery mirror the staged restoration. Customers see incremental improvement rather than a single all-clear that they cannot verify against their own experience. Per-stage status updates, an explicit final all-clear, named comms author for continuity through long recoveries.