Distributed Tracing with OpenTelemetry in 45 Minutes (Tutorial)
Two services, one trace, end-to-end. From zero install to a working Jaeger UI showing the call graph, the 45-minute path that skips the rabbit holes.
Step 1: Run Jaeger as the backend (5 min)
Jaeger is the easy default trace backend. One container; built-in UI on port 16686.
docker run -d --name jaeger \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
jaegertracing/all-in-one:latest
Open http://localhost:16686. Empty UI; no traces yet. Ports 4317 (gRPC) and 4318 (HTTP) accept OTLP, the standard wire format.
Step 2: Run the OTel Collector (10 min)
The Collector is the recommended production pattern: services send traces to a local Collector; the Collector batches, processes, and forwards to backends. For dev you can skip it and send straight to Jaeger; for habit, set it up.
cat > otel-collector-config.yaml << 'EOF'
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
exporters:
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp/jaeger]
EOF
docker run -d --name otel-collector \
--link jaeger \
-p 4327:4317 -p 4328:4318 \
-v $PWD/otel-collector-config.yaml:/etc/otel-collector-config.yaml \
otel/opentelemetry-collector-contrib:latest \
--config=/etc/otel-collector-config.yaml
Apps will send to the Collector on 4327/4328; the Collector forwards to Jaeger.
Step 3: Instrument service A (10 min)
Node.js example. The OTel auto-instrumentations cover Express, fetch, and most popular libraries with zero code changes beyond initialization.
npm install @opentelemetry/sdk-node \
@opentelemetry/auto-instrumentations-node \
@opentelemetry/exporter-trace-otlp-grpc
// tracing.js, load BEFORE the app
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const sdk = new NodeSDK({
serviceName: 'service-a',
traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4327' }),
instrumentations: [getNodeAutoInstrumentations()],
});
sdk.start();
// app.js
require('./tracing');
const express = require('express');
const fetch = require('node-fetch');
const app = express();
app.get('/a', async (req, res) => {
const r = await fetch('http://localhost:3001/b');
res.json(await r.json());
});
app.listen(3000);
Step 4: Instrument service B (10 min)
Same shape. Critical: same Collector endpoint, different serviceName.
// tracing.js, same as A but serviceName: 'service-b'
// app.js
require('./tracing');
const express = require('express');
const app = express();
app.get('/b', (req, res) => res.json({ ok: true }));
app.listen(3001);
Step 5: Make the cross-service call (5 min)
Run both services. Hit http://localhost:3000/a a few times. The auto-instrumentation propagates trace context via the W3C Traceparent header, so service B's span automatically becomes a child of service A's span.
Step 6: View the trace in Jaeger (5 min)
Open http://localhost:16686. Pick "service-a" from the Service dropdown. Click "Find Traces." You should see traces with two spans, one from service A, one from service B as a child. Drill in for timing detail.
The 45-minute clock stops here. Total cost: $0, three containers, two small Node services. You have working distributed tracing.
Four common pitfalls
Forgetting to require tracing.js BEFORE the app. The auto-instrumentations have to patch the libraries before they are imported. Wrong order = empty traces.
Sending too many spans without sampling. The default head-based sampler is "all spans." At any meaningful volume this overwhelms the Collector. Set OTEL_TRACES_SAMPLER=parentbased_traceidratio with ratio 0.1 to start.
Missing context propagation through queues. HTTP context propagates automatically; SQS/Kafka/RabbitMQ require manual injection of trace headers into message attributes. The async hop is the most common gap.
Hardcoding the Collector URL. Config it via OTEL_EXPORTER_OTLP_ENDPOINT. Different envs need different endpoints; do not bake one into code.
Where to go next
Three production upgrades, in order of value. (1) Replace head-based sampling with tail-based (sample-in slow / errored traces preferentially) using the Collector's tail_sampling processor. (2) Add manual spans for business-meaningful operations (your auto-instrumentation will not cover the function calls you most want to see). (3) Migrate from Jaeger to a longer-retention backend (Tempo, Grafana Cloud, Honeycomb) before you exceed local storage.