AI Failure Modes: A Taxonomy
Production AI systems fail in characteristic ways. A taxonomy of the failure modes makes them debuggable and, sometimes, preventable.
Eight failure modes
- Hallucination: confident wrong output.
- Prompt injection: attacker-controlled content overrides instructions.
- Distribution shift: production inputs diverge from training distribution.
- Cascading errors: one bad agent step corrupts subsequent steps.
- Model regression: provider updates the model, your app breaks.
- Cost runaway: bug or pattern causes 10-100x normal token usage.
- Privacy leakage: model regurgitates training data.
- Tool misuse: agent uses a tool incorrectly with real-world consequences.
Detection
Each has a different signature. Hallucination → output divergence from grounded sources. Distribution shift → embedding drift. Cost runaway → token-rate alerts. Building specific monitoring for each pays back the first time one fires.
Response patterns
Postmortem template: which failure mode, which detection signal, root cause, fix, and what changed in the eval set. Treat AI incidents like infrastructure incidents, with the same discipline.