Monitoring the On-Call

The on-call rotation is itself a system that needs monitoring. The metrics.

Page volume

Page volume is the first signal. Per-shift, per-engineer, per-service cuts each surface different patterns; together they show whether the rotation is sustainable.

Response time

Response time has two halves. Page-to-ack measures reachability; ack-to-action measures effectiveness. Both deserve their own metric and their own degradation alert.

Rotation health

Rotation health is the structural metric. Headcount, tenure, departures all signal whether the system that produces on-call is healthy or degrading.