Alert Source of Truth

Alerts defined in many places drift. Single source of truth.

The multi-tool problem

Most stacks have alerts in 3+ places: Datadog monitors, Prometheus rules, PagerDuty event rules, plus a few SaaS tools. Without a source of truth, alerts diverge: the same condition is alerted differently in each tool, or alerts exist in one tool nobody knows about. Pick one source of truth per layer and treat the others as derived.

Git is the source of truth

Git is the right source of truth. Alert config in git, applied via Terraform, Pulumi, or a custom controller (Datadog provider, Prometheus rule files, PagerDuty Terraform); console edits are forbidden because CI runs terraform plan weekly to detect drift and the drift triggers a ticket to the offending team; branch-protected and code-reviewed like application code.

The alert inventory

The inventory is generated, not maintained. Single inventory with alert name, owner team, severity, source tool, runbook URL, dashboard URL; generate from the git repo (canonical) rather than each tool’s API (deployment target); publish weekly so anyone can search for “who owns alert X” without asking around.

Migrating off console-edited alerts

The migration is staged. Phase 1: import existing alerts via terraform import or equivalent (one-time bulk migration). Phase 2: lock down console write access (read-only for everyone except a break-glass account). Phase 3: monthly drift detection (drift means someone broke the rule, investigate).

When to pick the IaC approach

The decision is scale-driven. Above 50 alerts or 3+ teams, the IaC approach pays back within a quarter; below that, a documented spreadsheet inventory is enough; don’t try to manage alerts in 5 tools by hand because the drift will eat you.