Incident Correlation Engine groups related alerts into single incidents. When 80 services start erroring because the database is slow, you get one page about "database degraded," not 80 service pages. The engine uses the service graph, time proximity, and shared root signals to identify related alerts; ungrouping is one click if the engine got it wrong.
The engine uses three signals to decide if two alerts belong together. (1) Service-graph proximity: alerts on services connected by the live service map are likely related. (2) Time proximity: alerts within 60 seconds of each other on connected services are very likely related. (3) Shared symptom: alerts citing the same upstream cause (database, redis, queue) consolidate.
Sometimes two genuinely separate incidents fire at the same time. The engine may group them; the operator can split. One click ungrouping splits any group into two new incidents and re-routes the on-call accordingly. The split is recorded so the engine can learn from it.
Default tuning is moderate: prefer grouping when the signals are strong. You can dial more aggressive (group at weaker signals, fewer pages, more risk of grouping unrelated incidents) or less aggressive (group only at strong signals, more pages, less risk of bundling). Per-tenant config; ships with sensible defaults.
Weekly report: alerts fired, incidents formed, noise-reduction percentage, mis-group rate (from operator splits), and trend. Use the report to defend the value of the platform when finance asks "what does Nova actually save us?", pager-fatigue costs are a real number, not just a vibe.
Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.
Pager fatigue is a major MTTR driver. Correlation pages once per incident, not per alert.