Prometheus Alertmanager Routing
Alertmanager's tree-based routing. The patterns that work.
The route tree
Routes form a tree. The top-level route matches everything; sub-routes match more specific labels.
Each route has matchers (label-based filters) and a receiver (where to send the notification).
Inheritance: sub-routes inherit settings from parents. Override specific fields at the sub-route level.
Matchers
Label equality: severity = critical. Label regex: service =~ 'payment.*'. Match by label values.
Multiple matchers: AND-combined. Service = 'foo' and severity = 'critical' both required.
continue: true: sub-route matches but processing continues. Useful when an alert needs multiple receivers.
Receiver types
PagerDuty, Slack, email, webhook. Each has its own configuration block.
Receivers can compose: a single named receiver might fire to PagerDuty AND Slack.
Inhibition rules suppress lower-priority alerts when higher-priority ones are firing. Region-down inhibits per-pod alerts; reduces noise.
Grouping
group_by labels: alerts with matching values get grouped into one notification. Reduces alert spam during incidents.
group_wait: how long to wait for additional alerts before sending. 30 seconds typical.
group_interval: how often to send updates for an existing group. 5 minutes typical.
Operating the routing
amtool routes test verifies that a sample alert routes correctly. Catches misconfigurations before deploy.
Per-route delivery rate metric. Surfaces routes that are underused (worth retiring) or overused (worth splitting).
Configuration in git, deployed via CI. UI access read-only; changes go through review.