Alert Acceptance Test Tracking

Track which alerts have passed acceptance tests.

Why alerts need acceptance tests

Most alerts are never proven to fire. They sit in Prometheus or Datadog config for years until the day they were supposed to page and silently don’t. The discipline below treats every alert rule the same way you treat a unit test: untested code does not ship, and untested alerts should not page.

How to run the test

The test injects the failure mode the alert claims to detect, then verifies the page lands where it should within the window the SLO requires. Everything else is bookkeeping.

What to track

You cannot improve a number you do not publish. The four counts below go on the reliability dashboard and into the weekly review.

Tooling that helps

You do not need a custom platform. The pieces below glue together in a long weekend and cover the most common alerting stacks.

Adopt incrementally

Most teams that try to test every alert at once stall before they ship the first one. Sequence matters.