The 15-Minute Incident Rule
If you cannot describe what's happening in 15 minutes, declare an incident. The rule and the discipline that drives faster MTTR.
The rule
The 15-minute rule converts hesitation into action. If on-call does not understand the problem within 15 minutes of detection, declare an incident. The act of declaring unblocks resources, opens war rooms, and pulls in additional responders; declaring is the trigger, not the verdict.
- 15 minutes from detection. Explicit time threshold; forces a decision instead of drift, regardless of how confident the engineer feels.
- Declaration opens resources. Incident commander, war room, comms channels, and additional responders all unlock on declaration; that is the operational point.
- Published rule per team. Visible "15-minute rule" reference in the on-call runbook; new engineers inherit the bar without negotiation.
- Documented "what counts as understood". Explicit criteria for when investigation has identified the cause clearly enough; catches the over-confident "I've got this" pattern.
Why 15 minutes
The 15-minute window is empirical. Cause is most likely identified in the first 15 minutes of investigation; hesitation past that point usually means the engineer needs help, not more time.
- First 15 minutes most productive. Early-investigation window when context is freshest; beyond that, complexity grows and signal degrades.
- Hesitation costs minutes of MTTR. Every minute of dawdling past the threshold is a minute of customer impact; the rule removes the hesitation.
- MTTR data per team. Historical first-15-minutes data backs the rule with evidence; teams that publish this number stop arguing about the threshold.
- Time-anchored "what do we know" check. 15-minute checkpoint forces the explicit "what do we know versus what do we need" question; the answer drives declaration or continuation.
Avoid
Two failure modes break the rule: treating declaration as a personal failure, and waiting for certainty before declaring. Both are cultural, not procedural; fixing them takes leadership reinforcement.
- Declaration is not failure. No-blame-for-declaring norm; declaration is the right call when investigation is stalling, not a sign of weakness.
- Do not wait for certainty. Certainty comes after declaration, not before; waiting for it costs MTTR every time.
- Cultural reinforcement at the postmortem. Celebrate "declared correctly" outcomes in retros; the team learns from the example more than from the rule text.
- False-alarm tolerance. Explicit "false declaration is fine" norm; over-cautious cultures suppress declarations that would have helped.