Alert Source of Truth
Alerts defined in many places drift. Single source of truth.
The multi-tool problem
Most stacks have alerts in 3+ places: Datadog monitors, Prometheus rules, PagerDuty event rules, plus a few SaaS tools.
Without a source of truth, alerts diverge. The same condition is alerted differently in each tool, or alerts exist in one tool that nobody knows about.
Pick one source of truth per layer and treat the others as derived.
Git is the source of truth
Alert config in git, applied via Terraform, Pulumi, or a custom controller. Datadog provider, Prometheus rule files, PagerDuty Terraform.
Console edits are forbidden. CI runs `terraform plan` weekly to detect drift; drift triggers a ticket to the offending team.
Branch-protected, code-reviewed alert config. Same workflow as application code.
The alert inventory
Build a single inventory: alert name, owner team, severity, source tool, runbook URL, dashboard URL.
Generate from the git repo, not from each tool's API. The git repo is canonical; the API is the deployment target.
Publish the inventory weekly. Anyone can search for "who owns alert X" without asking around.
Migrating off console-edited alerts
Phase 1: import existing alerts via terraform import or the equivalent. One-time bulk migration.
Phase 2: lock down console write access. Read-only for everyone except a break-glass account.
Phase 3: monthly drift detection. Drift means someone broke the rule; investigate.
When to pick the IaC approach
Above 50 alerts or 3+ teams, the IaC approach pays back within a quarter.
Below that, a documented spreadsheet inventory is enough.
Don't try to manage alerts in 5 tools by hand. The drift will eat you.