Database Failure Modes and Detection

Database failures cluster into a small number of recognizable patterns. Recognizing the pattern is half the fix.

Why failures cluster

Database failures look infinite until you classify them. In practice, most production incidents fall into a small number of recognisable patterns.

Six common modes

Per-mode symptoms

Each failure mode has a distinctive signature in metrics and logs. Knowing the symptoms narrows investigation immediately.

Auto-detection

Each pattern has a unique metric signature. Auto-detection with pattern matching beats human pattern-matching during an incident.

Antipatterns

What to do this week

Three moves. (1) Apply this pattern to your most-loaded table. (2) Measure query latency / write throughput before/after. (3) Document the win and the constraint so the next refactor inherits the knowledge.