Alert Fatigue Survey

Surveying on-call about alert quality. Quarterly.

Why a quarterly survey

Metrics tell you alert volume but not which alerts the on-call hates. A 10-question survey to every on-call once a quarter surfaces the qualitative noise: confusing alerts, useless runbooks, alerts that always fire during deploys. Anonymous, 5 minutes to fill out, run the week after each rotation.

The questions that matter

Four core questions cover most ground. How many pages required no action (count, not percentage); which 3 alerts were the most painful and why (free text); did any page have a runbook that actually helped (yes/no per page); on a 1-10 scale, how rested are you after this rotation.

Act on the results within 30 days

Action within 30 days keeps the survey alive. Publish a public alert-cleanup list within 2 weeks; top 3 painful alerts get retuned or retired by the next quarter (publish the diff); if the rest score is below 6, reduce the rotation size or pull alerts because burnout is not a metrics problem.

Trend, don't snapshot

One survey is a snapshot; four surveys are a trend. Compare quarter over quarter; track median noise pages per shift, painful-alert count, rest score and plot these against alert volume metrics; trend reversal means the cleanup ritual is working while stable or worsening means you need a bigger intervention.

Who runs the survey

The owner matters for honesty. On-call manager or SRE lead, not the owning team of the noisy service because that creates conflict-of-interest pressure on responses. 5 minutes per person 4 times a year, 20 minutes per engineer per year total; the ROI is alerts retired, rotations rebalanced, attrition reduced.