Alertmanager Inhibition Rules: A Practical Guide
Inhibition is the most underused feature of Alertmanager. Done well, it cuts cascading-alert noise by 50% with no loss of signal.
What inhibition is
Inhibition: alert A suppresses alert B if A is firing. The classic case is a node-down alert suppressing all the per-pod alerts on that node.
Without inhibition, one node outage fires 50 alerts; with inhibition, it fires 1.
Five practical patterns
- 1. Node-down inhibits pod alerts on that node.
- 2. Cluster-down inhibits all alerts in cluster.
- 3. Service-down inhibits dependent-service alerts.
- 4. Region-down inhibits region-specific alerts.
- 5. Maintenance-active inhibits planned-impact alerts.
YAML examples
inhibit_rules:
- source_matchers:
- alertname="NodeDown"
target_matchers:
- alertname=~"Pod.*"
equal: [node]
The equal field is the key. It matches source and target on a label both share, the affected node. Without it, the rule is too broad.
Testing inhibition rules
Test by triggering a known parent alert in staging; verify the children do not fire.
Auto-test in CI: a synthetic firing alert plus an assertion that downstream alerts are inhibited.
Antipatterns
- Inhibition without the equal field. Suppresses too much.
- Cascading inhibition without testing. Real incidents get hidden.
- Inhibition by pattern matching alone. Confirm by label.
What to do this week
Three moves. (1) Apply this pattern to your noisiest alert. (2) Measure pages-per-shift before/after for one week. (3) Schedule the quarterly review so the discipline survives team turnover.