Alerts Practical By Samson Tanimawo, PhD Published Dec 8, 2025 4 min read

Alerts as Data Pattern

Alert events as a stream. Powerful for analysis.

Alerts as a data stream

Every alert event (fired, acked, resolved, snoozed) is a row in a table. Pipe Alertmanager, PagerDuty, or Opsgenie webhooks into BigQuery, Snowflake, or ClickHouse.

Schema: alert_id, fired_at, ack_at, resolved_at, severity, owner_team, runbook_url, related_service. Add labels as JSON for flexible querying.

Retain 18 months. Long-window analysis (year-over-year noise, seasonality) needs a year of data minimum.

Queries that pay for themselves

Top 10 noisiest alerts last quarter. Use it to drive cleanup.

Mean time to ack and to resolve, broken down by team, severity, and time of day. Spot rotations that are quietly burning out.

Correlation: which alerts fire together. The pairs reveal hidden dependencies and let you collapse 5 alerts into 1.

What to put on the dashboard

Alert volume per week with a 13-week rolling average. Spikes warrant a postmortem.

Per-team page count per on-call shift. The team carrying 30 pages a shift will quit.

Alerts fired with no action taken, weighted by severity. This is your noise budget.

Retention and PII

Alert payloads can carry user IDs, IPs, error messages with email addresses. Strip PII at ingest, not at query time.

Use a deny-list on labels and a strict schema on the JSON column. Reject alerts that drop unstructured data into the stream.

Encrypt at rest, restrict access to the on-call analytics group. Audit access quarterly.

When to invest in this

If your alert volume is above 50 a week or your team has more than 3 rotations, build the pipeline. The ROI is the alerts you retire.

Smaller teams can use PagerDuty Insights or the equivalent until volume justifies a custom warehouse.

Don't analyse alerts in spreadsheets. The work to keep a sheet current is more than the work to wire up a webhook.