SLO & Reliability Practical By Samson Tanimawo, PhD Published Aug 11, 2025 4 min read

SLO and Graceful Degradation

Graceful degradation preserves SLO.

Modes

Graceful degradation is the difference between a service that fails partially and one that fails totally. When upstream dependencies are struggling or the service itself is under pressure, the right design serves a degraded but useful response instead of an error. The first piece of the design is naming the modes explicitly so the team can reason about which one is currently active.

The standard mode hierarchy:

The point of explicit modes is that the team can decide ahead of time which mode to use under which conditions. Without explicit modes, every incident becomes an improvisation, and improvisations under stress produce worse outcomes than rehearsed degradation.

Trigger

The trigger is what flips the service between modes. The discipline is to make this automatic and SLO-aware, not a manual decision in the middle of an incident.

SLO-aware triggers turn graceful degradation from a hopeful design into an active part of the operational practice. The service degrades early enough to preserve the SLO instead of degrading after the SLO has already been blown.

Recover

The recovery direction is just as important as the degradation direction. Coming back to full mode too aggressively reopens the same conditions that triggered degradation; coming back too slowly leaves the user experience worse than necessary.

Graceful degradation done right preserves your SLO during incidents that would otherwise blow it. The user gets a degraded but useful experience; the team gets time to fix the root cause; the budget gets protected. Nova AI Ops watches SLO burn rate, triggers degradation modes when configured thresholds fire, and auto-recovers when the burn-rate signal stays clean through the hysteresis window so the service preserves itself without manual intervention.