Alerts Practical By Samson Tanimawo, PhD Published Dec 20, 2025 4 min read

Customer Impact Explicit in Alerts

Alerts should state customer impact plainly.

Why state impact explicitly

An alert that says "high CPU on api-gateway-3" leaves the on-call to derive impact. An alert that says "checkout API p99 1.2s, SLO 200ms, 4% of users affected" is actionable in seconds.

Triage time drops by half when the customer impact is in the alert payload. The on-call moves from "is this real?" straight to "how bad?".

Stakeholder communication becomes trivial. The on-call can copy the impact line into Slack and skip the translation step.

How to compute impact

For latency alerts, include the affected endpoint, p99 value, SLO, and percentage of requests over budget. Source from Prometheus or Datadog APM.

For error alerts, include error rate, baseline error rate, and the customer-facing surface ("5% of all logins failing").

For capacity alerts, include time-to-exhaustion. "Disk 85% full, 4 days at current rate" beats "disk 85% full".

Templating the alert

Use Prometheus alert templates with descriptive annotations. Datadog monitor messages support similar templating with `{{value}}` and tag substitution.

Standardise the impact line format across the org. "Service: X, Impact: Y, Symptom: Z" is enough; variation here costs more than it saves.

Render impact in the same place every time. The eye should land on the customer impact in under 2 seconds.

When to skip impact

Internal-only alerts where there is no customer surface. Background batch jobs, log retention, certificate renewals.

Synthetic alerts that fire before customer impact is measurable. The synthetic itself is the signal; impact is implied.

But: even synthetic alerts should mention the protected user journey. "Synthetic checkout flow failed" is fine; "check failed" is not.

Apply this week

Audit your top 10 paging alerts. Add explicit customer-impact text to each.

Add a code review check: any new alert without an impact field gets bounced.

Measure MTTA over the next month. Explicit impact typically cuts triage time by 30 to 50 seconds per page.