On-Call Page Volume Targets
< 3 pages per shift.
Healthy page volumes
Per shift target: under 3 pages. Above this is a sign of either alert noise or genuine reliability issues.
Per week target: under 10 pages per engineer. Beyond this, on-call quality degrades; engineers fatigue.
Per night (off-hours): under 1. Sleep is a precondition for next-day functioning.
Measurement discipline
Per-engineer per-shift page count. Aggregated weekly; trended monthly.
Per-service page volume. Identifies the noisy services driving rotation pain.
Per-time-of-day distribution. Pages clustering in business hours are different from pages spread across all hours.
Responding to overflow
Rotation level: emergency staffing if volume is unsustainable. Backup on-call activated; alert tuning prioritised.
Service level: noisiest services get tuning sprints. Engineering capacity reallocated until volume drops.
Architecture level: persistently noisy services may indicate structural problems requiring re-architecture.
Page budget discipline
Per-team weekly page budget. Above the budget triggers tuning time; below it allows feature work.
Quarterly review of budget achievement. Teams chronically over budget need either additional staffing or architectural investment.
Budget overflow during major incidents is expected. The metric is normal-week volume, not incident-week volume.
Link to retention
Engineers leave teams with bad on-call. Page volume is a leading indicator of attrition.
Healthy on-call retains senior engineers. Mentor pairs build over time when the rotation is sustainable.
Investment in alert quality is investment in team retention. The math compounds across years.