Observability Intermediate By Samson Tanimawo, PhD Published Sep 20, 2026 6 min read

OpenTelemetry Collector in 30 Minutes: A Working Setup

Most OTel docs leave you with a 200-line YAML and no production picture. Here's a working OTel Collector deployment with the pieces that actually matter: receivers, processors, batching, and where to send the data.

Why the Collector and not the SDK

You can ship telemetry directly from your app SDK. You shouldn't. Putting a Collector between your apps and your backend gives you batching, retry, sampling, transformation, and a single place to swap vendors. It pays for itself the first time you change observability vendors.

The vendor-swap value. Without a Collector, every service has direct configuration to the vendor (URLs, API keys, format-specific code). Switching vendors means touching every service. With a Collector, the swap happens in one place; services don't need to redeploy.

The other benefits. Batching reduces request overhead 10x. Retry handles transient backend failures without losing data. Sampling lets you reduce volume without losing the signal. Transformation enriches every span/metric with consistent labels (env, region, team), no more half-tagged data.

The shape of a Collector config

Three sections: receivers (where data comes in), processors (what we do to it), exporters (where it goes). Plus a service.pipelines block that wires the three together.

The pipelines abstraction is critical. Different telemetry types (traces, metrics, logs) can use different pipelines. A trace might go through a sampling processor before exporting; a metric goes through a different aggregation. The pipelines let you compose receivers, processors, and exporters per signal type.

The discipline of small configs. Best Collector configs are 100-200 lines. Configs over 500 lines are usually doing too much; either split into multiple Collectors or simplify. Small configs are debuggable; large ones are not.

Receivers

Start with two: otlp (gRPC + HTTP for app data) and prometheus (scrapes existing Prometheus targets). Most teams need exactly these two for the first 6 months.

The OTLP receiver in detail. OpenTelemetry's native protocol; supports traces, metrics, and logs. Apps emit OTLP via the OpenTelemetry SDK. The Collector receives via gRPC (4317) or HTTP (4318). Most modern OTel-instrumented apps work out of the box.

The Prometheus receiver. Scrapes existing Prometheus targets; useful when you have Prometheus-instrumented services that haven't been rewritten to OTLP yet. Configures the same way as Prometheus's own scrape config; the team can copy-paste.

Receivers to add later. Filelog (for log files), kubeletstats (for K8s pod-level metrics), hostmetrics (for the host the Collector runs on). Each adds capability but also complexity; add deliberately.

Processors that every deployment needs

The batch processor in detail. Groups telemetry into bundles (default 8192 records or 200ms, whichever first). Reduces request count from N small requests to 1 batched request. The latency impact is small (<200ms); the throughput improvement is large.

The memory_limiter is non-negotiable. Without it, a Collector under heavy load runs out of memory and OOM-kills. Restart loses ~30 seconds of in-flight telemetry. With it, the Collector backpressures upstream, apps see "Collector busy, retry", and gracefully degrades instead of crashing.

The resource processor's value. Adds service.name, environment, region, etc. to every span and metric. Downstream queries can filter by these. Without resource enrichment, traces and metrics from multiple services blur together; with it, every signal is properly tagged.

Exporters

Pick one to start. otlphttp sends to any compatible backend. prometheusremotewrite for Prometheus-flavoured metrics. Configure a single backend; resist the temptation to fan out to three vendors at once.

The fan-out trap. Engineers love the idea of sending data to multiple vendors for comparison. The Collector supports this; the operational complexity multiplies. Each backend has its own credentials, rate limits, error modes. The first failure is harder to diagnose because data flows in three directions.

Single-backend discipline. Pick one observability vendor; commit to it. The Collector's swap-friendliness means you CAN change later without rewriting services; that's different from running multiple backends in parallel forever.

Failure modes that bite

Collectors not pinned to a version (config breaks on minor bumps). No memory_limiter (OOM kills). No retry on exporter failures (you lose data when backend hiccups). Single replica per region (a Collector restart drops a 30-second window). Solve each before they hurt.

The version-pinning issue. Collector configs evolve; minor version updates can deprecate processors or change syntax. Pinning to a specific version (e.g., 0.95.0) and updating deliberately prevents surprise breakage. Teams that auto-update get burned; teams that pin avoid the issue.

The single-replica issue. Most teams deploy one Collector per region. Restarting it (for upgrades, config changes, OOMs) drops 30+ seconds of data. Run at least 2 replicas per region with a load balancer; restart one at a time.

Retry configuration. The OTLP exporter has retry logic; configure it explicitly. Without retry, a transient backend hiccup loses data. With retry, the Collector backs off and retries, durably.

Common antipatterns

Apps shipping directly to vendor. No Collector; every app has vendor-specific code. Painful to migrate when changing vendors. Always Collector-mediate.

Collector running on the same node as the app. "It's faster locally." It is faster, until the Collector OOMs and takes the app down with it. Run Collectors in their own pods/nodes.

Configs without comments. The Collector config that worked 6 months ago is mysterious now. Always document why each processor or exporter exists; the future-you who modifies the config will thank present-you.

Skipping the silent run. New Collector deployment goes straight to production. The first time it sees production traffic, OOMs or rate limits surface. Run in shadow first (parallel to existing pipeline); validate before cutover.

What to do this week

Three moves. (1) If you don't have a Collector deployed, deploy one with the minimum config (otlp receiver + batch + memory_limiter + resource processors + one exporter). (2) Audit your existing Collector config for the three required processors. Most teams find they're missing memory_limiter. (3) Confirm the Collector has at least 2 replicas per region. Single-replica deployments are silent reliability holes.