On-Call Page Volume Targets
< 3 pages per shift.
Healthy page volumes
Page volume targets put numbers on what an acceptable on-call shift looks like. Without numbers, fatigue grows silently and engineers leave with vague reasons.
- Under 3 pages per shift. Per-engineer per-shift bar; above it is a sign of alert noise or real reliability issues, not heroism.
- Under 10 pages per week. Per-engineer per-week bar; beyond it, on-call quality degrades and engineers stop responding cleanly.
- Under 1 off-hours page per night. Sleep is a precondition for next-day functioning; off-hours pages compound across the rotation.
- Published target per team. Visible bar that the team agreed to; supports buy-in and gives engineers grounds to push back when volume creeps up.
Measurement discipline
Targets without measurement are aspirational. Per-engineer, per-service, and per-time-of-day breakdowns each surface a different aspect of the on-call experience.
- Per-engineer per-shift count. Page count per shift, aggregated weekly, trended monthly; surfaces uneven distribution across the rotation.
- Per-service page volume. Noise contribution per service; identifies the services driving most of the rotation pain.
- Per-time-of-day distribution. Business-hours versus after-hours split; different patterns demand different responses (alert tuning versus business-hours processes).
- Per-quarter trend chart. Volume trajectory across quarters; drift surfaces faster on a chart than in retrospective complaints.
Responding to overflow
Overflow has three response levels: rotation, service, and architecture. Each addresses a different cause; pick the right level for the symptom.
- Rotation level. Emergency staffing per rotation when volume is unsustainable; backup on-call activated and alert tuning prioritised.
- Service level. Tuning sprint for the noisiest services; engineering capacity reallocated until volume drops to target.
- Architecture level. Persistently noisy services may indicate architectural problems; tactical fixes do not address them.
- Documented response per overflow. Named owner and timeline per overflow event; supports accountability and prevents the response from drifting.
Page budget discipline
Page budgets work like error budgets. Above the budget triggers tuning time; below it allows feature work; chronic overage triggers staffing or architectural investment.
- Weekly page budget per team. Bar set per team; above it pulls engineering time into tuning, below it releases time for features.
- Quarterly budget-achievement review. Chronically over-budget teams need staffing or architectural investment, not another quarter of tactical fixes.
- Major-incident overflow exemption. Incident weeks do not count against the budget; the metric is normal-week volume.
- Named budget owner per team. Steward who tracks the budget and escalates when it goes off the rails.
Link to retention
Page volume is a leading indicator of attrition. The math compounds: bad on-call drives senior engineers out, mentor pairs stop forming, the rotation gets worse for the engineers who stayed.
- Engineers leave bad on-call. Volume-driven attrition is real; page volume predicts departures better than satisfaction surveys.
- Healthy on-call retains seniors. Mentor pairs build over time when the rotation is sustainable; bad rotations break that pattern.
- Alert-quality investment is retention investment. Quarterly alert tuning produces the same effect as a retention bonus, more durably.
- Exit-interview signal. On-call mentions in exit interviews catch root causes that aggregate metrics miss.