Alert Noise by Team Attribution
Some teams' alerts are noisier. Attribute and act.
Why per-team attribution matters
Without attribution, alert noise is SRE’s problem; with attribution, it’s the team that wrote the rule. Per-team noise scores reveal which teams generate the most noise per service, and the surprising results are the norm. Attribution drives behavior change faster than any internal training program.
- Without attribution noise is SRE’s. With attribution, noise belongs to the team that wrote the rule.
- Per-team noise scores. Reveal which teams generate the most noise per service; surprising results are the norm.
- Behavior change accelerator. Attribution drives change faster than internal training programs.
- Per-team accountability. The team owns its rule output; the discipline lives where the change happens.
How to attribute
Attribution needs three mechanisms. Tag every rule with a team label as mandatory metadata; aggregate fire counts by team weekly with a leaderboard sorted by signal-to-noise; pull team mappings from the service catalog and route untagged rules to an unowned bucket that SRE actively shrinks.
- Mandatory team tag. owner_team is required metadata; rules without it route to the unowned bucket.
- Weekly aggregation. Fire counts per team published as a leaderboard sorted by signal-to-noise.
- Service catalog source. Team mappings pulled from Backstage or equivalent; the catalog is the source of truth.
- Unowned bucket. Default destination for untagged rules; SRE actively shrinks it as a backlog.
Metrics to publish
Five metrics tell the story. Total fires per team per week, auto-resolved fires per team, pages per on-call shift per team, cost per page (vendor fees plus interruption time, where some teams generate 10x others), and improvement velocity (week-over-week change in noise score).
- Total fires per team per week. The headline metric; the simple count.
- Auto-resolved fires. Self-clearing alerts that probably shouldn’t fire at all.
- Cost per page. Vendor fees plus interruption time; some teams generate 10x the cost of others.
- Improvement velocity. Week-over-week change in noise score; supports the trend, not just the snapshot.
Aligning incentives
Aligning incentives makes attribution real. Tie SLO budget to noise budget so the top-quintile noise team loses change-management privileges until the score drops; make new rule creation conditional on existing noise score with a delete-one-add-one rule for high-noise teams; recognise improvement publicly to reinforce the discipline.
- SLO-to-noise budget tie. Top-quintile noise team loses change-management privileges until the score drops.
- Delete-one-add-one rule. High-noise teams must delete one rule per new rule; supports active rule curation.
- Public recognition. Team that cut noise 40% in a quarter gets a shoutout; reinforces the discipline.
- Per-quarter incentive review. Incentives reviewed for actual behaviour change; supports continuous tuning.
Start with a public dashboard
Don’t enforce penalties before publishing data. Visibility alone moves the needle 30%; skip naming-and-shaming language and frame as team improvement; audit attribution accuracy monthly because misattributed noise erodes trust in the whole system.
- Publish before penalising. Visibility alone moves the needle 30%; the data is the first lever.
- Improvement framing. Skip naming-and-shaming; frame as a team improvement metric, not a leaderboard of failures.
- Monthly attribution audit. Misattributed noise erodes trust in the whole system; verify the data.
- Per-team coaching offered. The data points to teams that need coaching; offer it before penalty.