Alerts Practical By Samson Tanimawo, PhD Published Sep 27, 2025 4 min read

Alerts Depending on Other Incidents

Some alerts shouldn't fire during specific incidents.

The cascade problem

When the primary database goes down, a hundred services alert at once. Most of those alerts are downstream symptoms; the human only needs the root cause.

Without dependency-aware suppression, the on-call drowns in pages that all describe the same incident. MTTA recovers slowly even when the root is identified in a minute.

PagerDuty, Opsgenie, and Nova AI Ops support dependency rules; few teams configure them properly.

Building the dependency graph

Start with the service catalog. Backstage, OpsLevel, or a homegrown YAML file all work. Map each service to its critical upstream dependencies.

Use OpenTelemetry traces to validate the graph. Manual catalogs drift; trace-derived graphs stay accurate within hours.

Update the graph in CI. New service deploys should fail if dependencies are not declared; this is the only way to keep it fresh.

Suppression rules

If service A is in incident state and service B depends on A, suppress B's alerts for the duration of A's incident plus a 5 minute cooldown.

Always log the suppression. The on-call should be able to query "what was suppressed during incident X?" for the postmortem.

Never suppress security alerts or data-loss alerts. The cost of a missed signal there outweighs any noise reduction.

When suppression backfires

Stale dependency graphs suppress real alerts. Always include a kill switch to disable suppression during a major incident.

Bidirectional dependencies (rare but real) confuse simple rule engines. Map them explicitly or use a graph-aware engine.

Cross-team dependencies need cross-team postmortems. Suppression that hides another team's incident from them is worse than no suppression.

Get started

Pick the top 5 services by page volume. Map each to its 3 most-critical upstreams.

Configure dependency suppression in PagerDuty event orchestration or your alerting tool.

Run for one month, then review every suppressed alert in a postmortem. Adjust until the false suppression rate is under 1%.