Alerts Intermediate By Samson Tanimawo, PhD Published Aug 25, 2026 8 min read

Pager-Load Budgeting

Borrow the SLO error-budget idea for the on-call. Cap pager-pages-per-week per team, enforce it like an SLO, and the noise drops because the budget makes it everyone’s problem.

Why error budgets work

Error budgets work because they put a number on a thing that used to be a feeling. “The site is down too often” doesn’t move anyone; “we’ve burned 60% of the quarterly budget in 3 weeks” does. The same idea applies to pager load, “the on-call is suffering” doesn’t move budgets; “we’re 30 pages over the weekly cap” does.

The structural reason it works. Without a budget, the cost of adding an alert is zero to the author and infinite to the on-call. The budget puts a price on the alert; the team has to decide whether the new alert is worth more than an existing one. Suddenly alert authoring becomes a finite-resource problem.

The cultural reason it works. The budget makes pager noise a team metric, not a personal complaint. The on-call doesn’t have to argue alert-by-alert in retro; they point at the budget chart. The conversation is about which alert to retire to make room, not whether the noise is “really that bad.”

Defining the budget

Pick a number per team. We use 2 pages per on-call shift (a week) as the soft cap, 5 as the hard cap. That’s 8 pages per month soft, 20 hard. The number isn’t magic, it has to feel achievable for your team to take it seriously. Start where you are; tighten over time.

The unit. “Page” means a phone-buzzing page, not an inbox alert. Out-of-hours pages count double; in-hours pages count single. The asymmetry reflects the real cost, sleep interruption is more expensive than a daytime context-switch.

The denominator. Per team, per week. Not per service, not per cluster. The team is the unit that owns the alerting; the budget belongs to the same unit that can fix the noise. Per-service is too granular; per-org is too blunt.

The exclusion list. Real, customer-impact incidents are excluded from the budget. The budget is for noise, not for incidents you wanted to know about. Build a labelling discipline: every page gets tagged “real-incident” or “noise” in retro; only the noise counts against budget.

Enforcing the budget

A budget without enforcement is a wall poster. Three enforcement levers, in order of strength.

Lever 1: Weekly review of overage. When a team blows through the cap, the next week’s sprint planning starts with the question “which alert do we silence or fix to get back under?” Not optional, not deferred. The team has to make a trade-off in front of each other.

Lever 2: Feature freeze on chronic offenders. If a team is over budget for 4 weeks running, no new features ship until the alerting is fixed. This is the SLO-style lever from Google’s playbook applied to alerts, the team owns the noise; the team has to fix it before they get to add more.

Lever 3: Alert promotion gating. New alerts can’t go to pager unless the team has 20%+ of their budget unspent. The new alert author has to decide which existing alert to retire to make room. Forces ruthless prioritisation.

The culture-keeper. The budget enforcer can’t be the on-call, they’re too tired to argue. It has to be the EM or the staff engineer; someone with authority to say “we’re shipping nothing this week, we’re fixing alerts.”

Tracking and reporting

The budget needs a dashboard. Make it ugly, make it public, make it weekly.

The dashboard. One row per team. Columns: pages this week, soft cap, hard cap, % of budget burned, top-3 noisy alerts, top-3 services. Color the row green/yellow/red. The visibility is the point, teams can see each other’s noise and learn from the quiet ones.

The retro report. Every Monday, paste the previous-week budget table into the team channel. The team sees their number, sees the leaderboard, makes a plan. 5 minutes; over time it becomes a ritual.

The trend. Track week-over-week change. Healthy teams burn down their pager load over a quarter; teams in trouble see the budget creep up. The trend is more important than the absolute, a team at 4 pages-per-week and dropping is healthier than one at 1.5 and rising.

Antipatterns

The unenforced budget. A target without consequence is just a number. If teams are routinely 3× over budget with no follow-up, the budget is decoration. Pick a smaller, enforceable number rather than a noble, ignored one.

The downgraded alert. “We’re over budget, let’s demote five alerts to inbox.” If those alerts were actionable, you’ve made the on-call quieter and the system worse. Demote based on the two-question test, not because of budget pressure.

The off-team scapegoat. Some teams blame “platform alerts” or “cloud-provider alerts” for blowing their budget. The fix isn’t to argue about ownership, if the alert pages your team, your team owns it. Either fix it or kill it.

The post-incident exemption. “That was a real incident, doesn’t count.” Sure, for the noise budget. But also: if you’re having lots of real incidents, that’s a separate conversation. Don’t let real incidents hide behind “not noise.”

What to do this week

Three moves. (1) Pick a number. 2 noise-pages per week per team is a good starting cap; revise after a quarter. (2) Stand up the dashboard, even a single Google Sheet with weekly counts works for the first month. (3) In your next planning cycle, agree on the enforcement: weekly review at minimum, feature-freeze for chronic over-budgeters if you can stomach it. The pager-load budget is the cheapest way to make alerting hygiene a team metric instead of a private suffering.