Span Links: Connecting Async Flows
When a request triggers async work, span links connect the flows. The pattern, the tooling, and the visualisations.
When you need it
Span links are an OpenTelemetry feature for connecting spans across traces. Where parent-child relationships connect spans within a trace, span links connect spans across traces. The pattern fits scenarios where work in one trace causes work in another.
What scenarios need span links:
- Producer-consumer queues.: A request enqueues a message; the consumer dequeues and processes it. The producer's work is in one trace; the consumer's work is in another. Without a link, the relationship is invisible.
- The producer's request is one trace.: The producer's work begins with the user's request and ends when the message is enqueued. That work is one trace.
- The consumer's processing is another.: The consumer's work begins when the message is dequeued and ends when processing completes. That work is its own trace; it has no parent in the producer's trace.
- Without a link, you cannot follow the flow.: The two traces are disconnected. Investigation that spans the queue boundary requires manual correlation; the team cannot navigate from producer to consumer.
- Batch jobs that process events from many requests.: A nightly batch job processes events from many user requests. The batch job's trace exists; each contributing request had its own trace; the link connects them.
- The job's trace links to all the requests.: The batch job's spans include links to the requests that contributed events. Investigation can navigate from the batch job to its inputs; the cross-trace flow is preserved.
The pattern fits async and batch flows. Synchronous flows use parent-child relationships; async flows use span links.
How
The implementation uses the OpenTelemetry Span API. The link is created with the originating trace's identifiers; the SDK records the link; the backend stores it.
- OTel's Span#addLink() API.: The OpenTelemetry SDK provides addLink() (or equivalent) on the span. The method takes a trace context that identifies the linked span.
- Pass the trace ID and span ID of the originating work.: The link references the originating span by its trace ID and span ID. The backend can find the linked span; the relationship is preserved.
- Most queue libraries auto-instrument.: OpenTelemetry's instrumentation libraries for Kafka, RabbitMQ, SQS, others handle span links automatically. The team's code does not need explicit linking.
- Verify; some do not.: Some queue libraries do not auto-instrument span links. The team verifies and adds explicit linking where needed.
- Custom flows need explicit linking.: Custom queue implementations or non-standard flows require explicit linking code. The pattern is straightforward; the team adds it where the standard libraries do not.
The implementation is bounded. The standard libraries do most of the work; explicit linking handles the cases they miss.
Visualisation
The visualization of span links varies by vendor. Some vendors show them prominently; some store them but do not visualize. The team should verify before relying on the visualization.
- Vendor support varies.: Different tracing backends handle span links differently. The capability is part of the OTel spec; the visualization is vendor-specific.
- Honeycomb, Datadog, Tempo all show span links to varying degrees.: Each vendor has its own approach. Some show links as connections in the trace UI; some show them as referenced spans; some show metadata.
- Test before relying.: The team tests how the vendor displays span links before relying on the visualization. The test reveals what the team can actually see; expectations align with reality.
- Some vendors store the link but do not visualise it.: A vendor might preserve the link in the data without making it visible in the UI. The link is queryable but not browseable; the visualization gap matters for some workflows.
- Plan around the visualization gaps.: Where visualization is incomplete, the team works around it. Custom dashboards, programmatic queries, manual correlation all are options. The visualization gap is bounded; the link's value is preserved.
Span link pattern cross-flow is one of those tracing advanced techniques that pays off for systems with significant async or batch flows. Nova AI Ops integrates with tracing platforms, supports span links, and produces the cross-flow visibility that mature distributed tracing requires.