Alert Dependency Graph

Alerts depend on metrics, services, integrations. Map the graph.

Why map dependencies

Alerts depend on metrics, metrics depend on exporters, exporters depend on services, services depend on integrations: break any link and the alert silently stops. Without a graph, the first time you discover the node-exporter pod has been gone for two months is during an outage when the host alert never fires. Treat the alert catalog as a dependency graph, not a flat list.

What's in the graph

The graph has well-defined nodes and edges. Nodes: alerts, recording rules, metric names, exporters, services, datasources. Edges: alert depends_on rule, rule depends_on metric, metric exposed_by exporter, exporter runs_on service, service ingested_by datasource. Store in Neo4j or flat YAML; the discipline of writing it down is what matters.

What the graph unlocks

The graph unlocks three concrete benefits. Blast radius for a metric rename (renaming http_requests_total breaks 14 alerts; the graph lists them); health checking the dependency chain via a monitoring synthetic that walks the graph and catches broken exporters before they break alerts; onboarding because new engineers see relationships rather than isolated rules.

Automating the graph

Automation builds the graph cheaply. Parse Prometheus rules with promtool to extract metric references and build metric-to-rule edges; pull exporter health from /metrics scrape success and build metric-to-exporter edges from scrape labels; pull service ownership from CMDB or Backstage rather than reinventing service catalog data.

Worth it above 100 rules

The investment threshold is rule count. Below 100 rules, the catalog is small enough to keep in your head and a dependency graph is overkill; above 100, the graph pays for itself the first time a renamed metric breaks pages silently; skip the visual UI tooling because a queryable JSON file is enough.