Alerts Practical By Samson Tanimawo, PhD Published Jan 16, 2026 4 min read

Alert Test-Fire Pattern

Synthetically fire alerts to verify the pipeline.

Why test-fire alerts

Most alerts have never fired in production. The wiring (rule, receiver, escalation, runbook link) is unproven.

A test fire confirms the path end-to-end. From metric to page to acknowledged human, in under 5 minutes.

Without test fires, the first real fire is also the first integration test. Don't run integration tests during outages.

How to test-fire

Inject a synthetic metric that crosses the threshold. For Prometheus, push to a test target that returns the right value for 5 minutes.

Use a labeled test rule: env=test in the matcher. Real receivers route real test fires to a non-prod PagerDuty service.

Verify acknowledgment, escalation, and resolution end-to-end.

How often

At rule creation: mandatory. The rule isn't merged until a test fire confirms the path.

Quarterly: rotate through the rule list, fire 10% per week. Every paging-tier rule sees a test within 90 days.

After any change to receivers, escalation, or PagerDuty service config: re-test the affected rules.

Automation

GitHub Actions or Argo Workflows runs the test injector on schedule. The job fails if the page didn't ack within window.

Datadog's API-driven monitors and synthetic tests cover the SaaS path. Trigger via Terraform.

Maintain a list of last-tested timestamps in the alert catalog. Stale tests are visible to all.

Do it for paging tier

Skip for ticket and email tiers. Cost-benefit isn't there.

Don't test-fire during business hours unless the test channel is clearly labeled. Avoid waking real on-call.

After the first quarter, the test cost amortizes. The first quarter is the expensive one.