Alerts Practical By Samson Tanimawo, PhD Published Oct 13, 2025 4 min read

Noise vs Coverage Frontier

More alerts catch more issues but create more noise. The trade.

The trade-off

Tighter alerts catch more incidents but produce more noise. Looser alerts produce less noise but miss real problems.

Every alert sits on the noise/coverage frontier. Moving the threshold trades one for the other.

There is no globally optimal point. Each service has its own ratio based on customer tolerance, team size, and traffic shape.

Track every customer-impacting incident. For each, ask: did an alert fire before the customer noticed?

Coverage = (alerts that fired in time) / (real incidents). Target 80% to 95% depending on service tier.

Below 80% coverage: under-alerted; missing real problems. Above 95% coverage: likely over-alerted.

Pages-per-real-incident ratio. If you page 5 times to catch one real incident, your noise is 4:1.

Acceptable ratios: 1:1 for tier 1 services, 2:1 for tier 2, 3:1 for tier 3. Above that, tune.

Track per service, not globally. A single noisy service drags the org-wide average and hides healthy rotations.

Better signals move the entire frontier outward. SLO-based burn-rate alerts have lower noise at equal coverage than threshold alerts.

Multi-signal compound alerts ("errors AND latency AND traffic") shift the frontier most. Single-signal alerts trade off the most.

Synthetic monitoring shifts coverage upward without much noise cost. Real-user monitoring (RUM) is similar.

Pick a service. Compute current noise ratio and coverage from last quarter's data.

Decide where on the frontier the team wants to be. Document this; it constrains future alert design.

Tune toward the target over the next quarter. Re-measure; iterate.