Prometheus + Alertmanager
From zero to alerts firing into Slack. Covered in detail: install, instrumentation, alert rule, routing.
Step 1: Run Prometheus
Prometheus runs as a single binary or container. The minimal setup is one config file plus a port; everything else flows from there.
- Container.
docker run -d -p 9090:9090 -v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheusboots Prometheus with your config mounted in. - Config minimum. A starter
prometheus.ymlwith one scrape job pointing atlocalhost:9090proves the loop end to end. - UI. Open
localhost:9090; the Status, Targets, and Graph pages are the three you will live in. - Storage. The container is ephemeral; for production add a volume so the TSDB survives restarts.
Step 2: Instrument app
- Add a metrics endpoint to your app (Prometheus client lib).
- Configure scrape in
prometheus.yml. - Reload:
kill -HUP $(pidof prometheus).
Step 3: Write alert rule
Alert rules live in a separate file referenced from the main config. Pin the rule to a real metric and a clear condition; vague rules produce noisy pages.
- File.
alerts.ymlwith one rule (e.g.up == 0for instance down) under agroups:block. - Reference. Add
rule_files:entry inprometheus.yml; reload withkill -HUPor the API. - For clause.
for: 5mavoids flapping; the alert only fires after the condition is true for 5 minutes. - Verify.
localhost:9090/alertsshows the rule state (Inactive, Pending, Firing); use it before you trust Alertmanager routing.
Step 4: Route to Slack
Alertmanager is the routing brain. Prometheus fires alerts; Alertmanager groups, deduplicates, and dispatches them to Slack, PagerDuty, email.
- Run.
docker run -d -p 9093:9093 -v $PWD/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager. - Slack receiver. Configure a Slack webhook URL in
alertmanager.ymlunderreceivers:; route by severity label. - Wire to Prometheus. Set
alerting.alertmanagersinprometheus.ymlpointing at the Alertmanager port. - Trigger. Manually break the condition (stop a target); confirm Slack receives the message; tune grouping before going live.
Antipatterns
- Default scrape interval. Tune for your workload.
- No alert rule expiry. Pages forever.
- Alertmanager without grouping. Spam during cascade.
What to do this week
Three moves. (1) Run the tutorial end-to-end on your own laptop / sandbox. (2) Apply the pattern to one production workload. (3) Document the variations you needed; share with the team.