Alert Action Distinction

Alerts that fire actions vs alerts that just notify. The pattern.

Two classes of alert

Action and notification are different surfaces. Action alerts demand a human response within minutes (page someone, wake them up, expect a runbook); notification alerts inform but require no action right now (they land in Slack, email, or a ticket queue for the next business day); mixing the two is the root cause of fatigue because a pager that fires for FYI events trains responders to ignore it.

How to classify each rule

Three rules guide classification. Ask the runbook question: does the responder do something specific in the next 15 minutes? If not, it’s a notification, not a page. Use severity tiers explicitly: Sev1 pages on-call, Sev2 opens a ticket, Sev3 emails the owning team and map each alert to one tier at creation. Reject rules without a runbook because no runbook means no action which means no page.

Routing the two cleanly

Routing keeps the channels clean. Alertmanager receivers split by severity label (PagerDuty for sev1, Slack webhook for sev2, email for sev3, no rule sends to more than one tier); disable mobile push for the notification channel because the phone is reserved for action; ticket creation should be idempotent using the alert fingerprint as the ticket key to avoid duplicates during flapping.

Review cadence

Three review patterns keep classification accurate. Quarterly: scan the action tier and demote any alert that produced no remediation in 90 days (promotion in the other direction is rare and needs a post-incident finding); track demotion rate as a noise indicator (a team demoting 20% of action alerts per quarter is signaling initial classification is wrong); audit Slack-only alerts too because a notification everyone ignores is clutter and should be deleted.

Default to notification

The default is conservative. When in doubt, classify a new alert as notification because promotion to action requires evidence of an incident missed by the lower tier; this inverts the common reflex of paging on everything (inverting it is the point); skip if your team has fewer than 30 alerts total because classification overhead is bigger than the noise.