Distributed Tracing for Multi-Agent Systems
When five agents collaborate, a single trace is the only way to debug. The instrumentation, the span layout, and the queries that find the slow specialist.
Why one trace
When five agents collaborate, the only way to debug is a single trace that shows all spans. Per-agent logs are insufficient.
OpenTelemetry is the standard. Use it. The agent framework should integrate with OTel by default; if it does not, wrap it.
One trace per user-visible operation. The triage-then-remediate flow is one trace, with sub-spans per agent.
Span layout
Root span: the user-visible operation ("handle alert X"). Direct children: each agent invocation. Grandchildren: tool calls and model calls inside each agent.
Each span carries: agent_role, model_name, tokens, cost. The carry-over makes per-span analysis trivial.
Use semantic conventions where they exist (OTel semantic conventions for AI). Where they do not, document your conventions internally.
Useful queries
"Slow agent runs": traces with duration > p99. The slow runs are where bugs hide.
"Specialist used most often": span count by agent_role. Tells you the dependency graph between specialists.
"Trace with error": traces with at least one error span. The error span is the starting point for debugging.
Context propagation
Trace context (trace_id, span_id) is passed between agents. Every inter-agent message carries it; every tool call inherits it.
Propagation breaks when an agent runs in a separate process without context. Use the standard OTel propagators (W3C TraceContext) to keep everything connected.
Test propagation. A trace that drops a span at an agent boundary is a broken integration; fix it before it spreads.
Cost of tracing
Sampling: 10% trace sampling for high-volume agents; 100% for low-volume. The 10% is fine for aggregates; the 100% is needed for debugging individual issues.
Storage: traces are voluminous. Pick a sane retention (30 days hot, 90 days warm).
Latency: tracing adds <5% overhead when configured correctly. Bad configurations can add 30%; profile and tune.