The Correlation Window for Alerts
Alerts that fire near in time are usually one incident. The window for correlation, the algorithm, and the savings in pager noise.
The window
The correlation window for alerts is the time window during which related alerts are grouped together. The window's size determines the trade-off between aggregation (fewer pages) and detection speed (faster notification of new issues).
What window choice looks like:
- 5 minutes is the standard.: Most teams default to a 5-minute correlation window. Related alerts within 5 minutes group together; the on-call gets one notification covering the group.
- Longer equals more aggregation.: A longer window groups more alerts together. The page count drops; the cognitive load is bounded; the trade-off is detection delay.
- But slower notification of new issues.: Longer windows can delay notification of new issues. An issue starting at the beginning of a window does not produce a separate page; it joins the existing group; the team's response is delayed.
- Tune by service criticality.: Different services warrant different windows. Critical services need fast notification; longer windows delay too much. Less critical services tolerate longer windows.
- Critical services use 1-2 min.: Critical services have short windows. The aggregation is bounded; notification is fast; the team responds quickly.
- Low-criticality use 10-15 min.: Low-criticality services use longer windows. The aggregation is broader; the notification volume is bounded; the cost-benefit favors aggregation.
The window size is the lever. Tuning it produces the right balance for each service.
The algorithm
The correlation algorithm groups alerts and dissolves groups based on the window. Understanding the algorithm helps the team configure correctly.
- Group alerts by service and severity.: The grouping is along these dimensions. Same service, same severity, alerts join the group; different service or severity, alerts go to different groups.
- Within the window, second alert in a group does not page.: The first alert in a group produces a page. Subsequent alerts within the window add to the existing group; no new page; the cognitive load is bounded.
- Adds to the group.: The aggregated notification is updated. The on-call sees the additional alert in the group; the picture grows; the response covers all the alerts.
- Group dissolves N minutes after last alert.: Once no alerts have arrived for the dissolution window, the group dissolves. New alerts after dissolution start a new group; pages fire fresh.
- New alerts in the dissolved group page fresh.: If the same conditions recur after the group dissolves, the new alerts are a new incident. The page fires; the on-call is notified; the cycle continues.
The algorithm is mechanical. Once configured, it produces consistent grouping behavior.
The save
The savings are real. Page volume drops dramatically during incidents; the on-call sees comprehensive notifications; the cognitive load is bounded.
- Typical reduction: 50-70% in page volume during incidents.: Real incidents produce many alerts. With correlation, the cumulative page count drops 50-70%; the on-call's experience improves dramatically.
- The on-call sees one page with 10 contributing alerts, not 10 pages.: The single page conveys the situation. The 10 alerts are visible within the notification; the on-call sees the full picture; the response is informed.
- Cognitive load drops.: Processing one notification is much easier than processing 10. The on-call's attention is preserved; the response is more focused.
- Triage starts with the full picture, not a partial one.: The aggregated notification provides comprehensive context. The on-call's first action is informed by all the alerts; not by the first one alone.
- Compounds across many incidents.: The benefit is per-incident. Across many incidents, the cumulative time and attention saved is significant; the on-call experience is meaningfully better.
Correlation window for alerts is one of those alerting disciplines that pays off proportionally to alert volume. Nova AI Ops integrates with paging platforms, applies correlation rules, and produces the merged notifications that incident response actually uses.