Monitoring the On-Call
The on-call rotation is itself a system that needs monitoring. The metrics.
Page volume
Page volume is the first signal. Per-shift, per-engineer, per-service cuts each surface different patterns; together they show whether the rotation is sustainable.
- Per-shift, per-engineer, per-service. Multi-cut volume view per rotation; trends and outliers surface against the cuts.
- Healthy means bounded and predictable. Per-team volume target; drives whether the rotation is sustainable.
- Per-quarter trend chart. Per-quarter volume trajectory; catches degrading rotation health before incidents.
- Per-service noise share. Per-service volume contribution; identifies the noisy systems driving the page count.
Response time
Response time has two halves. Page-to-ack measures reachability; ack-to-action measures effectiveness. Both deserve their own metric and their own degradation alert.
- Page to acknowledgement. Per-incident MTTA timer; reachability and tooling check; the engineer’s phone reached them.
- Acknowledgement to action. Per-incident response-effectiveness timer; real engagement vs over-eager ack and silence.
- Slowing means burnout or tooling. Per-quarter trending-up signal; leading indicator of rotation degradation.
- Per-quarter cause investigation. Named driver for any degradation; catches "the metric just slipped" complacency.
Rotation health
Rotation health is the structural metric. Headcount, tenure, departures all signal whether the system that produces on-call is healthy or degrading.
- Engineers per rotation. Per-rotation headcount; drives shift frequency; below 6 engineers becomes punishing.
- Tenure on rotation. Per-engineer time-on-rotation; drives experience distribution; uniform low tenure is a turnover signal.
- Voluntary departures. Per-quarter departure rate; leading indicator of staffing problems; matters more than absolute headcount.
- Per-rotation exit-interview signal. Per-departure on-call mention; catches systemic on-call problems before they cause cascading turnover.