The Incident Severity Creep Pattern
Sev 3 incidents that should have been sev 2. The pattern, the cost, and the calibration that prevents it.
Symptom
Severity creep shows up as a distorted incident-count curve where most incidents land at the lowest severity that lets responders avoid escalation. The dashboard looks calm; customers see something worse. The gap between internal severity and customer impact is the diagnostic signal.
- Sev 3s outnumber sev 1s and 2s combined. Distribution check per quarter. The team is systematically under-classifying.
- Customer impact larger than severity suggests. Impact-vs-severity gap per incident. Direct symptom of mis-classification.
- Quarterly published distribution. Severity histogram visible per quarter. Catches drift before it normalises.
- Customer-feedback signal per incident. Customer-side reaction captured. Catches when customers see worse than the severity says.
Cause
The cause is a mix of vague definitions and team norms. Engineers default to the safer-feeling lower severity because escalation feels expensive socially even when it is the right call. Vague definitions plus social cost equals systematic under-classification.
- Vague severity definitions. Unclear bar between sev 2 and sev 3 per org. Each engineer applies their own threshold.
- "Do not escalate" team norm. Over-cautious classification culture per team. Drives systematic under-classification.
- Cost-of-escalation perception. "I do not want to wake the manager" instinct per engineer. Catch through training and explicit norms.
- No-blame escalation culture. "Escalating is the right call" norm published org-wide. Supports honest classification rather than defensive under-classification.
Fix
The fix is specific definitions plus periodic calibration. Vague rules produce vague decisions; concrete thresholds plus quarterly calibration on a 30-random-incident sample produces consistent classification.
- Specific definitions per severity level. Named customer-impact threshold per level. Removes guesswork.
- Periodic calibration. 30-random-incident review per quarter. Verifies the team applies the rules consistently.
- Documented reasoning per incident. "Why this severity" note captured. Supports later calibration and audit.
- Quarterly rule update. Definition refinement per quarter. Catches definitions that do not survive contact with reality.