Fan-In and Fan-Out Alert Patterns
Some alerts aggregate; others split. Patterns.
Two patterns
Fan-in: many signals collapse to one alert. Useful when 50 instances each fire "high CPU" and the on-call only needs one page about the cluster.
Fan-out: one signal triggers alerts to multiple teams. Useful when a database alert needs to reach the database team, the platform team, and the on-call rotation.
Both patterns are about routing. They are not opposites; they are different problems.
When to fan in
Multiple sources signaling the same root cause. Cluster-wide CPU spikes, region-wide error increases, fleet-wide deploy failures.
Group by service, region, or deployment. PagerDuty event orchestration and Nova AI Ops both expose grouping windows.
Group window: 5 to 15 minutes. Shorter misses related signals; longer holds back real escalation.
When to fan out
When one alert genuinely affects multiple teams. A database outage hits the DB team (fix), the platform team (capacity context), and the on-call (paging).
Use distinct routing rules per audience. Don't send the same payload to all three; tailor it.
Avoid fan-out for political reasons ("manager wants to be notified of everything"). Build a digest instead.
Anti-patterns
Fan-out everything: every team gets every alert. Alert fatigue spreads across the org instead of being concentrated.
Fan-in too aggressively: one mega-alert per region per hour. Real signals get suppressed; the on-call sees a wall of "something happened somewhere".
Fan-in based on service name pattern matching. Pattern matching breaks when a service is renamed.
Pick by use case
Same root cause, many sources: fan in. Use grouping windows.
Different audiences, one signal: fan out. Tailor each route.
When in doubt: don't add either. Most alerts work fine as one signal to one team.