Pager Load Balancing Across Services
Some services page more. Distribute the load.
Rotate engineers across services
Static service ownership concentrates pager load on whoever drew the noisy service. Rotation is the cheapest mechanism to spread it.
- Cadence. Quarterly rotation within a domain, or driven by incident volume when one service spikes.
- Expertise spread. Multiple engineers know each service end to end; bus factor goes up, hand-off cost goes down.
- Load spread. The same person does not always carry the noisy service; burnout risk drops.
- Onboarding lag. First two weeks of a rotation are slower; budget for it in shift planning.
Match staffing to volume
Rotation size should match page volume. A three-engineer rotation on a noisy service produces three burned engineers per quarter.
- Noisy services. 8-10 engineers in rotation when pages are frequent; nobody carries it more than weekly.
- Quiet services. Three-engineer rotations are sustainable for low-volume services; less is fragile.
- Cross-service backup. Engineers from quieter services back up busier ones during launch or seasonal peaks.
- Floor. Below three engineers, a single sick day or vacation breaks coverage; expand or merge.
Quarterly review
Without periodic review, rotations ossify and the engineer who tolerates pages keeps absorbing them.
- Per-engineer pages-per-shift. Across services; imbalances surface in one chart.
- Real load vs noise. Investigate whether high page count reflects real incidents or alert tuning failures.
- Tune first, rotate second. Address underlying causes; fix flapping services before reshuffling humans.
- Document deltas. Note what changed each quarter so the trend is visible across rotations and managers.
Compensate for outliers
Some engineers carry more than their share. Recognise it explicitly or they leave; the data on pages-per-engineer is a leading indicator.
- Recognition. Time off, stipends, or public credit for engineers carrying extra load this quarter.
- Time-zone equity. Engineers in bad time zones get more off-hours pages; compensate or rotate the burden.
- Retention signal. Pages-per-engineer trends inform retention models; sustained outliers predict departures.
- Manager check-in. 1:1 conversation alongside the metric; data without context produces bad decisions.