The Exemplar Pattern: Metrics to Traces
Exemplars link a slow metric data point to the trace that produced it. The pattern, the OTel support, and what teams gain.
The idea
The exemplar pattern bridges metrics and traces. Each metric data point can carry a sample trace ID that contributed to it; clicking on the metric data point jumps to the actual trace. The debugging loop closes; investigation moves from "the metric looks bad" to "this is what was happening".
What the pattern looks like:
- Each metric data point can carry an exemplar.: Beyond the value and timestamp, the data point includes an exemplar reference. The exemplar is a trace ID; the trace is in the tracing backend.
- A sample trace ID that contributed to that data point.: The exemplar is one trace from the many that contributed to the metric value. For a histogram bucket, the exemplar is one trace whose latency fell into that bucket.
- Click a slow p99 latency point.: The dashboard shows a p99 latency spike. The investigator clicks; the dashboard's exemplar feature jumps to one of the slow traces that produced the p99 value.
- Jump to the actual trace.: The team is now in the trace UI looking at a real slow trace. The investigation has gone from aggregate to specific in one click.
- The debugging loop closes.: Without exemplars, the team sees the metric and must independently find a representative trace. With exemplars, the trace is one click away. The friction in the debugging loop is dramatically reduced.
The pattern is simple but powerful. It connects two observability primitives that previously required manual correlation.
Support
The pattern's adoption depends on tooling support. The OpenTelemetry SDK, the metric storage, and the dashboard tool all need to handle exemplars. The support is increasingly available.
- OpenTelemetry SDKs support exemplars.: The OTel SDK adds exemplar collection to instrumentation. The collection is automatic; no per-call code is needed.
- Storage backends support them.: Prometheus, Cortex, VictoriaMetrics, Mimir all support exemplars. The metric storage preserves the exemplar; the data is available for queries.
- Vendors are catching up.: Datadog, Grafana, others increasingly support exemplars in their UIs. Some still require workarounds; the trajectory is toward universal support.
- Implementation is mostly configuration.: Enabling exemplars typically requires a configuration change in the SDK and the storage backend. The team's instrumentation does not need rewriting.
- The cost is small.: The investment in enabling exemplars is bounded. The benefit is significant; the cost-benefit favors adoption strongly.
The support is increasingly comprehensive. Teams adopting now find most of their stack ready.
The benefit
The benefit is faster, more reliable debugging. Mean-time-to-context drops; the investigation starts with real data; the patterns are clearer.
- Mean-time-to-context drops dramatically.: Without exemplars, finding a representative trace for an investigation takes minutes. With exemplars, it takes seconds. The debugging speed improves correspondingly.
- Engineers debug from data, not from guesses.: The exemplar provides actual data. The investigation is grounded in real traces; speculation is replaced by evidence.
- Particularly valuable for tail-latency investigations.: p99 latency investigations benefit most. The metric tells you "p99 spiked"; without exemplars, the team must find a slow trace to investigate; with exemplars, the slow trace is the exemplar.
- The exemplar is the slow trace by definition.: The histogram bucket containing the p99 value contributed an exemplar from a trace in that bucket. The exemplar is, by construction, a slow trace.
- Compounds across investigations.: Each future investigation benefits. The cumulative time savings are large; the team's debugging capability is strengthened.
Exemplar pattern metrics-to-traces is one of those observability bridges that pays off across many investigations. Nova AI Ops integrates with metric and trace storage, supports exemplars across the pipeline, and produces the integrated debugging experience that mature observability requires.