Alertmanager Cheatsheet
Top commands.
Overview
Alertmanager sits between Prometheus and the human or system that gets notified. Its job is to turn raw firing alerts into focused, deduplicated, well-routed pages. The cheatsheet captures the commands and concepts that on-call actually reaches for at 2am.
- Routing tree. Alerts route to the right team based on labels (team, severity, service); routing is hierarchical with fall-through to defaults.
- Silencing. Known issues, planned maintenance, and noisy alerts get silenced via amtool or the UI; durations and matchers must be deliberate.
- Inhibition. A parent failure (cluster down) suppresses child symptoms (every pod alerting); prevents alert storms during major events.
- Grouping and notification integrations. Related alerts collapse into one notification; PagerDuty, Slack, email, and webhook receivers cover most stacks.
The approach
Config-as-code, route-by-team, silence with intent. The Alertmanager UI is for inspection; the YAML is the source of truth.
- amtool silence add. CLI for silencing with matchers, comment, and duration; the comment field is required by convention so the next on-call knows why.
- amtool alert query. CLI for inspecting current firing alerts and matching silences during investigation.
- Route by labels, not by service name strings. Labels (team, severity, service) drive the tree; routing on free-text breaks under refactor.
- Inhibit cascading alerts and group by service. Parent inhibits child; same-service alerts group; both reduce pager-storm risk during real incidents.
Why this compounds
Alertmanager mastery keeps paying back: every tuned route reduces false pages, every inhibition rule prevents one alert storm, and the team's routing tree becomes a template every new service inherits.
- Lower pager fatigue. Tuned routing, grouping, and inhibition shrink page volume to actionable signal.
- Faster incident response. Focused alerts surface the actual problem instead of every downstream symptom.
- Reusable patterns. The routing tree becomes a template; new services inherit the team's silencing and grouping discipline.
- Institutional knowledge. amtool fluency teaches the team how Alertmanager actually behaves, not just the UI's view of it.
Alertmanager mastery is one of those operational disciplines that pays off across years. Nova AI Ops integrates with Alertmanager, surfaces routing patterns, and supports the team's alerting discipline.