Alert Scope Creep Prevention

Alerts that get tweaked drift from purpose. Prevent.

The problem

Alerts created with a clear purpose drift over time. A new condition is added, then another, until the alert fires on a different problem than originally intended; six months later the on-call sees the alert fire and has no idea what to do because the original purpose is lost. Scope creep is silent and happens through small PRs each defensible in isolation.

How it happens

Three patterns produce scope creep. An incident teaches you that condition X also matters and you add it to an existing alert instead of creating a new one; a team member who didn’t write the alert tweaks the threshold to fix a single false positive (the alert now fires for a different reason); aggregate operators (max, p99) get added incrementally and nobody knows which signal is actually being measured.

Prevention

Three mechanisms prevent scope creep. Every alert has a one-line purpose statement (“detect customer-facing checkout latency above SLO”) in the alert payload; code review requires any change to the alert logic to explain whether the purpose statement still holds; quarterly alert review walks the catalog with the owning team and splits or deletes anything where purpose has drifted.

Split vs extend

The default is splitting. Two alerts with clear purposes are easier to operate than one alert with two purposes; extend only when the underlying signal is genuinely the same (“p99 from Prometheus” plus “p99 from Datadog APM” is one alert with redundancy, not two purposes); if a single alert needs two runbooks, it’s two alerts.

Apply this quarter

The application is concrete. Audit your top 10 paging alerts and confirm each has a single, clear purpose; split any alert with two purposes (migrate carefully and don’t delete the original until the new alerts have run for 2 weeks); add the purpose-statement field to your alert template and reject new alerts without one.