First Jaeger Install
Distributed tracing.
Overview
The first Jaeger install moves distributed tracing from theory to production. Jaeger collects, stores, and visualises spans across service boundaries; the patterns established in the first deploy survive every subsequent service the team adds.
- Cross-service tracing. Spans propagate across HTTP, gRPC, and queue boundaries. Investigations see the full request path.
- OTel-compatible. Modern Jaeger ingests OpenTelemetry data directly. Use OTel SDKs for instrumentation.
- UI for trace analysis. Visual span trees and service dependency graphs. The UI is what makes traces actually useful.
- Sampling and storage backends. Head-based or tail-based sampling matches the volume; Cassandra, Elasticsearch, or in-memory storage matches the retention need.
The approach
Three habits keep tracing useful instead of becoming a write-only data lake: instrument with OTel, sample deliberately, and retain for the actual investigation window.
- OTel SDKs. Standard instrumentation across language runtimes. Avoids per-language tracing libraries.
- Sample 5 to 10 percent. Full sampling is expensive and rarely earns its storage cost. Tail-sample errors and slow traces specifically.
- Tail-sampling for errors. Always keep traces with errors or unusual latency. Investigation needs the bad traces, not random samples.
- 7-day retention plus conventions. Most investigation happens in the recent past; conventions for span names and attributes produce consistent traces across services.
Why this compounds
The first Jaeger install takes effort to wire correctly. Each subsequent service plugs into the existing infrastructure; team visibility compounds with every instrumented service.
- Faster cross-service investigation. Traces show the full request path. MTTR drops on the multi-service incidents the team sees most.
- Dependency understanding. Service maps reveal the actual architecture, not the diagram. New engineers ramp faster against the real graph.
- Performance visibility. Span timing reveals bottlenecks that aggregate metrics miss. Optimisation targets become specific.
- Year-one investment, year-two habit. The first install is heavy lift. By year two, every service ships with tracing from day one.