The Dependency Graph Discipline
Most teams cannot draw their service dependency graph. The discipline that keeps it accurate, queryable, and useful for incident response.
Source of truth
Code-derived: parse the codebase for outbound calls. Most accurate but lags refactors.
Runtime-derived: instrument outbound calls; the graph is built from real traffic.
Combined: declared in code (intent) plus observed at runtime (reality). Discrepancies are the bugs.
What the graph is for
Incident response: 'service X is down; what else is affected?' Graph answers in seconds.
Migration planning: 'we want to retire service Y; who depends on it?' Graph lists.
On-call routing: page the right team based on the affected node.
Maintenance
Auto-refresh from runtime. The graph stays current as code changes.
Alert on unexpected new edges. A new dependency that nobody declared is a flag worth investigating.
Quarterly review: are there services nobody depends on? They might be retirement candidates.