Alert Summary vs Detail
Alerts should summarise; detail is one click away.
The pattern
An alert payload should fit on a phone screen. Service name, customer impact, severity, runbook link.
Detail (full stack traces, dashboards, raw metric values) belongs one click away, not in the page itself.
The on-call reads the summary at 3am, decides to engage or not, and only opens the details once at a laptop.
What a good summary looks like
"checkout-api: p99 latency 1.2s, SLO 200ms, 4% error rate spike. Started 14:32 UTC. Runbook: ".
Service, what, by how much, when, where to look. Five clauses, one line each.
No emoji, no decorative text, no apologies. The on-call needs information, not tone.
What good detail looks like
Linked dashboard with the relevant time range pre-selected. Datadog and Grafana both support this via URL parameters.
Recent deploys (Argo CD events, GitHub Actions runs). The on-call needs to know if a change preceded the alert.
Top affected endpoints, top affected customers, current load. All derivable from APM data; pre-render the link.
Anti-patterns
Alerts that paste 200 lines of stack trace into the page payload. Mobile clients truncate; the actual error is hidden.
Alerts that say "see Datadog" without a deep link. Forces 5 manual steps at 3am.
Alerts with 12 fields all in the same priority. The eye doesn't know where to land.
Apply this week
Pick your 3 most-paged alerts. Rewrite each summary to fit one phone screen.
Move detail to a linked dashboard with pre-set time range and filters.
Test on a phone, not a monitor. The page is read on a phone first, every time.