Vendor Lock-In via SLO Tooling
SLO tools become hard to switch.
Risk
SLO tooling looks like a tactical decision (which dashboard product to use) and acts like a strategic one (multi-year vendor commitment with rising switching costs). Most teams pick a vendor based on first-year features and discover, two years later, that they have built a reliability practice on top of vendor-specific definitions, dashboards, and integrations that do not travel.
What vendor lock-in actually costs:
- SLO definitions are not portable.: Each vendor has its own format for SLO and SLI definitions. Datadog's syntax differs from New Relic's, which differs from Honeycomb's, which differs from open-source Prometheus-based stacks. Migrating means rewriting every SLO definition the team has built up.
- Historical data does not travel.: Most SLO platforms compute against their own metric store. The historical data lives in their database. When you migrate, you lose the rolling-window history that anchored your SLO calculations. Burn-rate alerts have to recalibrate from scratch.
- Dashboard work has to be redone.: Custom dashboards, alert routing, integration with Slack and PagerDuty, runbook links. Each one is configured per-vendor. A serious SLO practice with 50 services and many custom dashboards is hundreds of engineer-hours to recreate.
- Switching cost is real.: The total cost of switching SLO platforms is typically 6 to 12 engineer-months for a mid-size org. The cost is high enough that most teams do not switch even when they want to. The lock-in is not just contractual; it is operational.
- Vendor decisions become organization-shaping.: Once a team has standardized on a vendor for SLO management, they will not switch in 5 years even if a better tool appears. The investment is too sunk. Picking the vendor is picking the practice for a long time.
The lock-in is not malicious; it emerges naturally from the way SLO tooling works. The countermove is recognizing it before the commitment crystallizes, not after.
Standard
The way to avoid vendor lock-in for SLO tooling is to keep the SLO definitions in a vendor-neutral format. The OpenSLO standard exists exactly for this purpose: a YAML schema that describes SLOs, SLIs, and error budget policies in a way that any compatible platform can ingest.
- OpenSLO standard.: A community-maintained YAML spec for SLOs, SLIs, and error budget policies. Supported as an import format by the major SLO platforms (Datadog, New Relic, Nobl9, Sumo Logic) and exportable from most. Definitions are version-controlled in git, just like Terraform or Kubernetes manifests.
- Portable definitions.: An OpenSLO definition reads the same regardless of which vendor consumes it. Migrating between platforms is mostly the work of pointing the new platform at the same OpenSLO files. The platform-specific configuration shrinks to a thin layer that has to be reauthored; the SLO definitions themselves are reusable.
- Source of truth in git, not in vendor UI.: The SLO definitions live in your repo, not in the vendor's dashboard. PR review applies to SLO changes the same way it does to code. Audit trail is in git. Rollback is git revert. The vendor becomes a renderer, not the system of record.
- Vendor-agnostic alerting and dashboards as code.: Alerts and dashboards built on top of OpenSLO definitions can be expressed in vendor-agnostic configuration (Terraform with multiple providers, or a templating layer). The dashboard work that is normally lost on migration becomes portable too.
- Adoption cost is real but bounded.: Migrating an existing SLO practice to OpenSLO is a quarter of focused work for a mid-size team. The benefit is permanent vendor optionality. The math is in favor of the migration as long as you have more than 12 months of SLO-tool runway ahead.
OpenSLO is not perfect (some vendor features are not in the spec, some platforms support a subset). It is good enough that the lock-in cost drops by an order of magnitude. That is the level of optionality worth investing in.
Plan
The vendor decision is multi-year. Plan for it as such, not as a tactical tool selection. The questions to answer up front are different from the questions you answer in a tool evaluation.
- Multi-year tool decisions.: The choice of SLO platform commits you for at least 3 years, more likely 5+. Evaluate accordingly: not just "which tool is best today" but "which tool will still be the right choice in 5 years given how the team will grow."
- Lock-in cost as a deciding factor.: Two tools with similar features can have very different lock-in profiles. The one with vendor-portable definitions, exportable history, and configuration-as-code support is worth meaningful price premium because the optionality is worth real money.
- Open-source backstop.: A self-hosted Prometheus + Sloth + Grafana stack is the open-source SLO backstop. It is more work to operate than a vendor solution, but it removes the lock-in question entirely. For teams with strong Kubernetes operations, this is a valid choice. For teams without, it is usually not.
- Multi-vendor for portfolio resilience.: Some larger orgs deliberately use one vendor for some services and another for others. The cost is operational complexity; the benefit is empirical understanding of what each vendor is actually good at, plus a credible threat to switch when contracts come up.
- Re-evaluate at contract renewal.: Annual contract renewals are the natural time to revisit the choice. Even if the team does not switch, the alternative-evaluation exercise informs the renegotiation. Vendors price-discriminate; a credible alternative is the lever that captures better terms.
SLO platform choice is one of the most consequential tooling decisions an engineering org makes. Nova AI Ops integrates with OpenSLO definitions natively, supports import and export of SLO config in vendor-portable formats, and helps teams keep their SLO practice in a state where vendor switching remains a real option rather than a theoretical one.