Alert History Export
Alert history is data. Export it for analysis.
Why export alert history
Alerting tools keep 30-90 days of history by default, which is too short for trend analysis, postmortem reviews, or audit. Export to long-term storage (BigQuery, Snowflake, S3 plus Athena) with 18-24 months retention as the right floor; the export is the foundation for the cleanup ritual, the on-call survey, and the noise budget.
- 30-90 day default retention. Too short for trend analysis, postmortem reviews, or audit.
- Long-term storage. BigQuery, Snowflake, S3 plus Athena; 18-24 months retention is the right floor.
- Foundation for downstream work. Cleanup ritual, on-call survey, noise budget all depend on the export.
- Per-export schema discipline. Stable schema makes downstream queries durable across tool migrations.
How to wire the export
The wiring is simple. PagerDuty webhooks fire on incident events; pipe to a Lambda or Cloud Function that writes to a warehouse table. Datadog and Prometheus Alertmanager both support webhook receivers using the same pattern. The schema is stable: incident_id, alert_name, fired_at, acked_at, resolved_at, owner_team, severity, labels.
- PagerDuty webhooks. Fire on incident events; pipe to Lambda or Cloud Function; write to warehouse table.
- Datadog and Alertmanager. Both support webhook receivers; use the same Lambda pattern.
- Stable schema. incident_id, alert_name, fired_at, acked_at, resolved_at, owner_team, severity, labels (JSON).
- Per-source schema mapping. Each source maps into the same schema; supports cross-tool queries.
Retention and access
Retention and access need policy. 18 months minimum (24 months covers full year-over-year with one rollover); encrypt at rest; strip PII at ingest via a deny-list on label names; restrict access to SREs and engineering leads because alert history reveals who burns out and which teams are noisy.
- 18 months minimum. 24 months covers full year-over-year analysis with one rollover.
- Encrypt at rest. Standard warehouse encryption; the data is sensitive.
- Strip PII at ingest. Deny-list on label names; user IDs, IPs dropped at ingest.
- Restrict access. SREs and engineering leads; alert history reveals burnout and noisy teams; treat like HR data.
Queries that pay back the work
Three queries pay for the export immediately. Top noisy alerts drive the quarterly cleanup; time-to-ack and time-to-resolve per team drive rotation rebalancing; alerts during deploy windows catch deploy-induced noise that should be silenced or fixed.
- Top noisy alerts. Drives the quarterly cleanup; the most expensive alerts surface first.
- Time-to-ack and time-to-resolve per team. Drives rotation rebalancing; capacity follows the data.
- Deploy-window alerts. Catches deploy-induced noise that should be silenced or fixed.
- Per-query stored as view. The queries are stored as warehouse views; supports continued use without re-deriving.
Build vs buy
The build-vs-buy decision is data-driven. PagerDuty Insights and Datadog Watchdog cover the basics; use them until you outgrow the queries they support. Build a custom warehouse only when you need cross-tool analysis or labels they don’t expose; the queries you cannot run today are the justification.
- Vendor analytics for basics. PagerDuty Insights and Datadog Watchdog; use them until you outgrow.
- Custom for cross-tool. Build when you need cross-tool analysis or labels they don’t expose.
- Justification: queries you cannot run. Build because the queries are worth running, not to look smart.
- Per-team buy-build review. Decision documented; supports later revisiting as tooling evolves.