Alert Priority vs Severity
Two attributes; different. Both matter.
Priority and severity are different axes
Severity is impact. Sev1 means production is broken for users. Sev3 means a feature is degraded for a small cohort.
Priority is response order. P1 is acted on first; P3 waits. Within sev1 incidents, priority sequences the queue.
Conflating the two is why backlog grooming sessions devolve. A sev3 can still be P1 for a customer with a contractual deadline.
How to define severity
Sev1: revenue-impacting, customer-visible, no workaround. Page on-call, call the war room.
Sev2: degraded but not broken. Workaround exists. Ticket, business-hours response.
Sev3: minor or cosmetic. Backlog. Triage at next standup.
How to define priority
P1: do this right now. Cancels other work.
P2: do this in this sprint. Blocks the next.
P3: backlog. Reviewed at planning.
Mapping the two
Most sev1 incidents are P1. Most sev3 issues are P3. The interesting cases are sev2/P1 (a workaround exists, but the customer pays for fast fix) and sev1/P2 (production broken but contained, post-incident work scheduled).
Build a small RACI matrix that maps both onto on-call runbooks and backlog grooming.
Both fields are required for any new ticket or alert.
Standardize before scaling
Pick definitions before adding more teams. Late renames break dashboards and runbooks.
Keep three tiers each. More tiers fragment the data without improving response.
Audit usage quarterly. Tier inflation (everyone tags everything sev1) is the failure mode to watch.