Alerts With Sample Data Included

An alert without context is harder. Include sample data in the alert.

The idea

Alerts that include a sample of the bad data save 5 minutes of triage every time. “Error rate 12%, sample errors: error1, error2, error3” beats “error rate 12%”. Sample data answers the second question the on-call always asks (“what is the error?”); skipping that step matters at 3am. Most alerting tools support templated samples.

How to include samples

The shape varies by alert type. Log-based alerts: top 3 error message strings deduplicated (Datadog and Splunk expose this); trace-based alerts: 1 trace ID with a direct link to the trace view; metric-based alerts: top affected dimensions like “customer-id, region” that tell the on-call who is affected.

How much data

Sample size has limits. Three samples is usually enough (more than 5 is noise and the page becomes unreadable); truncate long messages to 200 characters with stack traces linked rather than pasted; always link to the full data source because the sample is bait and full detail is one click away.

Anti-patterns

Three anti-patterns survive too long. Samples that don’t match the alert condition (alert fires on 5xx, sample is a 4xx, trust dies); samples without timestamps (on-call wonders if the data is current minute or yesterday); samples that contain PII or credentials (mask emails, tokens, addresses because logs leak).

Apply this week

The application is targeted. Pick your 5 most-paged alerts and add sample data to each; verify the sample reaches Slack and SMS clients (some clients truncate aggressively, test on a real phone); mask any field that could contain customer data before it leaves the alerting pipeline.