Alert Fatigue Survey
Surveying on-call about alert quality. Quarterly.
Why a quarterly survey
Metrics tell you alert volume. They don't tell you which alerts the on-call hates.
A 10-question survey to every on-call once a quarter surfaces the qualitative noise: confusing alerts, useless runbooks, alerts that always fire during deploys.
Anonymous, 5 minutes to fill out. Run it the week after each rotation.
The questions that matter
How many pages did you receive this quarter that required no action. Count, not percentage.
Which 3 alerts were the most painful, and why. Free text.
Did any page have a runbook that actually helped. Yes/no per page.
On a scale of 1-10, how rested are you after this rotation.
Act on the results within 30 days
Survey results without action breeds cynicism. Publish a public alert-cleanup list within 2 weeks.
Top 3 painful alerts get retuned or retired by the next quarter. Publish the diff.
If the rest score is below 6, reduce the rotation size or pull alerts from the rotation. Burnout is not a metrics problem.
Trend, don't snapshot
One survey is a snapshot; four surveys are a trend. Compare quarter over quarter.
Track: median noise pages per shift, painful-alert count, rest score. Plot these against alert volume metrics.
Trend reversal means the cleanup ritual is working. Stable or worsening means you need a bigger intervention.
Who runs the survey
On-call manager or SRE lead. Not the owning team of the noisy service; that creates conflict-of-interest pressure on responses.
5 minutes per person, 4 times a year. Total cost: 20 minutes per engineer per year.
ROI: alerts retired, rotations rebalanced, attrition reduced. The cheapest signal you can buy.