The Correlation Window for Alerts
Alerts that fire near in time are usually one incident. The window for correlation, the algorithm, and the savings in pager noise.
The window
5 minutes is the standard. Longer = more aggregation but slower notification of new issues.
Tune by service criticality. Critical services use 1-2 min; low-criticality use 10-15 min.
The algorithm
Group alerts by service and severity. Within the window, second alert in a group does not page; it adds to the group.
Group dissolves N minutes after last alert. New alerts in the dissolved group page fresh.
The save
Typical reduction: 50-70% in page volume during incidents. The on-call sees one page with 10 contributing alerts, not 10 pages.
Cognitive load drops. Triage starts with the full picture, not a partial one.