Prometheus + Alertmanager Setup in 30 Minutes (Tutorial)
From zero to a working monitoring stack: install, expose your first metric, write a useful alert, and route it to Slack. The 30-minute path that skips the rabbit holes.
Step 1: Install Prometheus (5 min)
Docker is the fastest path. Create a working directory, drop a prometheus.yml in it (we will fill it in step 3), and start a container:
mkdir prom-stack && cd prom-stack
echo 'global: { scrape_interval: 15s }' > prometheus.yml
docker run -d --name prometheus \
-p 9090:9090 \
-v $PWD/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus:latest
Open http://localhost:9090. You should see the Prometheus UI. The Status → Targets page will be nearly empty, Prometheus is scraping itself and nothing else. That changes in step 3.
Step 2: Expose metrics from your app (5 min)
Most languages have a Prometheus client library that takes about a minute to wire up. The shape is always: import the library, register a counter or histogram, expose /metrics over HTTP. For a Node.js example:
npm install prom-client express
// server.js
const express = require('express');
const promClient = require('prom-client');
const app = express();
const requests = new promClient.Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'status']
});
app.use((req, res, next) => {
res.on('finish', () =>
requests.inc({ method: req.method, status: res.statusCode })
);
next();
});
app.get('/metrics', async (req, res) => {
res.set('Content-Type', promClient.register.contentType);
res.end(await promClient.register.metrics());
});
app.get('/', (req, res) => res.send('hello'));
app.listen(3000);
Hit http://localhost:3000 a few times, then visit http://localhost:3000/metrics. You should see http_requests_total with counts per status code.
Step 3: Configure scraping (5 min)
Update prometheus.yml to scrape your app:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- 'alerts.yml'
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
scrape_configs:
- job_name: 'app'
static_configs:
- targets: ['host.docker.internal:3000']
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Reload: docker kill -s HUP prometheus. Visit Targets again, you should see app as UP.
Step 4: Install Alertmanager (5 min)
cat > alertmanager.yml << 'EOF'
route:
receiver: 'slack'
group_wait: 10s
group_interval: 5m
repeat_interval: 1h
receivers:
- name: 'slack'
slack_configs:
- api_url: 'YOUR_SLACK_WEBHOOK_URL'
channel: '#alerts'
send_resolved: true
EOF
docker run -d --name alertmanager \
-p 9093:9093 \
-v $PWD/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanager:latest
Get a webhook URL from Slack → Apps → "Incoming Webhooks" → create one for your channel. Replace YOUR_SLACK_WEBHOOK_URL, recreate the container.
Step 5: Write your first alert (5 min)
cat > alerts.yml << 'EOF'
groups:
- name: app
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
> 0.05
for: 2m
labels:
severity: page
annotations:
summary: "5xx rate above 5% for 2 minutes"
EOF
Reload Prometheus (docker kill -s HUP prometheus). The Alerts tab will show the rule. Trigger it by simulating errors against your app, it should fire after 2 minutes.
Step 6: Route to Slack (5 min)
If steps 4 and 5 are correct, the first firing alert lands in your Slack channel within seconds. The default message is plain; tune it later by adding a title and text template under slack_configs.
Verify the round trip end to end. The 30-minute clock stops here. Total cost: $0, four containers, three small config files. You now have monitoring.
Four bootstrap mistakes to avoid
Scraping every label. High-cardinality labels (request IDs, user IDs, full URLs) explode Prometheus storage. Keep label cardinality under 100 per metric until you understand the cost.
Default retention only. Prometheus keeps 15 days by default. Configure longer (30-90 days) before the first incident, not after.
One alert, no severity. Use labels to distinguish severity: page (wakes someone) from severity: ticket (files a Jira). Mixing them trains people to ignore both.
No for: clause. Without it, a single bad scrape fires the alert. The for: duration is the simplest noise filter you have.
Where to go next
This minimal stack is a starting point, not a destination. The next four investments, in order of value: (1) move to multi-window burn-rate alerts on your real SLOs, (2) add Grafana for dashboards, (3) replace Prometheus storage with long-term storage (Mimir, Thanos, Cortex) when you exceed local-disk capacity, (4) add per-team Alertmanager routing so the right team gets the right page.