Alert Volume → Burnout Correlation

Studies show alert volume correlates with burnout. The data.

What the data shows

The correlation is well-documented. PagerDuty’s State of Digital Operations and Catchpoint’s SRE survey show direct correlation between page volume and on-call attrition (above 5 pages per night, retention drops sharply); internal Google SRE data published in the SRE Workbook shows teams above 2 incidents per shift report higher burnout scores; burnout is lagging because by the time the engineer quits, the team has tolerated noisy alerting for 6-12 months.

The mechanism

Three mechanisms drive the correlation. Sleep interruption is dominant (a single overnight page reduces next-day cognitive performance by 20%, three pages destroy the day); cognitive load compounds (page-shifted engineers cannot deep-focus the following day, which delays platform work, which produces more pages); predictability matters more than volume (10 expected pages on Tuesday is less damaging than 3 random pages over a week).

What to track

Three metrics surface burnout risk. Pages per shift, p95 across the rotation (target under 2 per night); after-hours pages as a ratio of business-hours pages (above 30% suggests poor batching or genuine underinvestment in resilience); on-call survey every quarter with two questions: “Was your sleep affected?” and “Could you have prevented the page?” (bin and trend).

Interventions that work

Three interventions reliably reduce burnout. Quarterly noise audits with a fixed budget for tuning (make tuning a tracked metric, not background work); pay for on-call (even modest stipends signal time is valued, the cultural effect is larger than the dollar amount); spread the load (two-week rotations are easier to absorb than week-long ones; primary plus secondary plus manager-on-call halves the load on each).

Apply this quarter

The application is concrete. Pull last quarter of pages from PagerDuty and compute median, p95, after-hours ratio per rotation; if any rotation exceeds 5 pages per shift p95, freeze feature work on the owning service until the noise budget is back under target; run the burnout survey once because the first run sets a baseline and subsequent runs measure intervention effectiveness.