Graceful Degradation as a Default Behaviour
Hard failures are easier to write but worse for customers. The four patterns that make degradation the default and the cost in code complexity.
Four patterns
Graceful degradation is not one technique; it is four. Each pattern fits a different failure mode and the right system uses all four.
- Default values. Downstream service down, return a sensible default; user sees a stale but reasonable result instead of an error.
- Cached responses. Cache cannot refresh, serve the stale value with a clear age signal; TTL extension under failure.
- Read-only mode. Database unreachable for writes, serve reads; partial product is better than total down.
- Feature reduction. Non-critical service down, hide its surface in the UI quietly; do not surface vendor names in errors.
The cost
Degradation is not free. Each fallback path is code, tests, and operational surface that has to be maintained alongside the happy path.
- More code. Each integration needs an explicit fallback path; the codebase grows on every dependency.
- More tests. Fallback paths are rarely exercised in normal traffic; deliberate fault-injection tests are the only honest way to verify them.
- Operational discipline. Fallback paths need separate monitoring; absence of errors does not mean the fallback works.
- Cognitive load. On-call must understand both the happy path and every fallback; runbooks need to cover both.
When NOT to degrade
Some surfaces should hard-fail. Degrading them creates security holes, financial errors, or wrong-answer trust violations.
- Authentication. Failing open creates security holes; hard fail is correct.
- Financial transactions. Half-completed payments are worse than refused payments; reject explicitly when uncertain.
- Data integrity. Anywhere a wrong answer is worse than a clear error; medical, legal, regulatory contexts.
- Default rule. If the customer would prefer 'sorry, try later' over a misleading result, hard-fail is the answer.