Incident Severity Rubric That Survives Real Pressure
Most severity rubrics fall apart in the moment. The four-quadrant model that holds up at 3 AM and produces consistent decisions across teams.
The two axes
Most severity rubrics fall apart under real pressure because they bundle too many factors. The two-axis model holds because it asks two clear questions and produces consistent sev assignments at 3am.
- Customer impact. How many customers affected, how badly; the user-facing magnitude.
- Time sensitivity. How long until impact escalates without intervention; the response-window constraint.
- Cross the two. Sev 1-4 across four quadrants; each quadrant has a clear definition.
- Holds at 3am. Two questions, both observable; an exhausted on-call can answer them under pressure.
Examples per quadrant
Concrete examples are how the rubric becomes real. Each quadrant has a representative incident shape; engineers calibrate their judgement against the examples.
- Sev 1. Many customers, escalating fast (e.g. login broken globally); all-hands, war room.
- Sev 2. Many customers, stable (e.g. feature degraded, not breaking); standard incident response.
- Sev 3. Few customers, escalating fast (e.g. one customer's data at risk); targeted response, faster than sev 4.
- Sev 4. Few customers, stable (e.g. non-critical feature bug); bugfix queue, no incident response.
Consistency across teams
The rubric only delivers value when consistent across teams. Cross-team incidents are dramatically easier when sev definitions match; quarterly calibration keeps them aligned.
- Same rubric everywhere. All teams use the same definitions; cross-team incidents become parsable.
- Quarterly calibration. Review the last 30 incidents; sev 2s that should have been sev 1, sev 4s that should have been sev 3.
- Single doc. Rubric documented once; linked from every incident channel; on the wall in the war room.
- Drift signal. If teams keep miscalibrating in the same direction, the rubric definitions need updating, not the engineers.