Alert Suppression Patterns: Maintenance Windows Done Right
Suppression done well saves on-call sanity. Done badly, it hides real incidents. The patterns are well-known; the discipline is the missing piece.
When to suppress at all
Suppression is the right call when an alert is firing on known-broken state and we already know. Repeating the same page wakes the same person uselessly.
It is the wrong call as a way to avoid fixing the underlying noise. The fix is the alert tuning, not the silence.
Four suppression patterns
- 1. Silence. Specific alert; specific window; explicit reason.
- 2. Downtime. Whole service; planned maintenance.
- 3. Dependent suppression. If parent fires, suppress children.
- 4. Deploy-window. Auto-suppress related alerts during deploys.
Auto-expire as the safety
Every silence has an expiry. No exceptions. The longest acceptable silence is 24 hours unless renewed by an explicit action.
Silences without expiry rot into permanent gaps in coverage. Auto-expire is the only protection.
Audit cadence
Quarterly review of all silences and suppressions. Anything still active after one quarter is a sign of unfinished alert tuning.
The discipline catches what auto-expire misses (renewed silences that should have been fixed instead).
Antipatterns
- Silence as a workaround for noisy alerts. Fix the alert.
- Silences without expiry. Coverage gaps inherited.
- Manual deploy-window suppression. Forget once and you page during every deploy.
What to do this week
Three moves. (1) Apply this pattern to your noisiest alert. (2) Measure pages-per-shift before/after for one week. (3) Schedule the quarterly review so the discipline survives team turnover.