Noise vs Coverage: The On-Call Trade-off
Tightening alerts reduces noise but risks missing real incidents. The framework for finding the right balance.
The cost of noise
Sleep loss; alert fatigue; ignored real alerts. The expensive failure mode.
Aim: real-page rate above 70%.
The cost of missed coverage
Customer-impacting incidents detected by users instead of alerts.
Aim: external-detected incidents below 5% of total incidents.
The tune
Both metrics together drive policy. If noise is too high, tighten. If coverage is bad, loosen or add alerts.
Quarterly review. Traffic shifts; alerts need recalibration.