Comparisons By Samson Tanimawo, PhD Published Jul 23, 2025 9 min read

OpenTelemetry vs Vendor Agents: The Tradeoffs Nobody Talks About

OTel is the right bet for portability. It's also three times the operational surface. Here's the honest comparison.

The portability pitch

The OpenTelemetry pitch is simple: instrument your code once, choose your backend later. Want to move from Datadog to Honeycomb next year? Change the exporter, not 50,000 lines of instrumentation.

For teams that have lived through a vendor migration, this alone justifies the switch. The headache of rewriting every custom metric and span call is a uniquely miserable engineering experience.

Operational surface is bigger

Vendor agents are one binary, one config, one support phone number. OpenTelemetry is SDKs per language, an agent/sidecar, a collector, and exporters, each with their own version cadence and breakage history.

Teams underestimate how much time goes into keeping a collector fleet healthy. Budget for it; it is not free.

What OTel still doesn't have as polished

Where vendor agents still win

Three situations where vendor agents are still the right choice:

  1. Your stack is small, your team is small, and you don't expect to change vendors in the next 3 years.
  2. You need tight integration with the vendor's APM features (e.g., Datadog Watchdog, New Relic AIOps) that OTel can't replicate today.
  3. Your language/runtime doesn't have a mature OTel SDK yet.

A migration path that works

Big-bang migrations fail. Iterative migrations succeed. The pattern we've seen work:

  1. Stand up the OTel collector alongside your vendor agent. Don't remove anything yet.
  2. Add OTel SDK to one service. Point both exporters at the collector. Pipe through to both vendor and a neutral backend.
  3. Compare signal. When OTel's traces and metrics match the vendor's for a service, disable that service's vendor agent.
  4. Repeat, one service per sprint. The full migration takes 6,12 months for a 50-service org, and the team never stops shipping product work.

The services that take longest are the oldest ones with custom vendor API calls. Budget those for the end.

Big-bang migrations fail. Iterative migrations succeed.

6-12 mo
iterative migration horizon
1
service per sprint, sustained

What not to do in the first month

Do not instrument every service. Pick one, pick something not business-critical, and learn the SDK and collector together.

Do not delete the vendor agent alongside the migration. Run both. Compare the signal. Delete the agent only after the OTel side has been steady for a full sprint.

Do not change the observability backend and the agent at the same time. One variable at a time. You will thank yourself during the first weird behaviour.