SLO for Feature-Flagged Paths
New features have separate SLOs.
Idea
Most teams compute SLOs at the service level: total successful requests divided by total requests across every code path. That works for stable services but obscures problems with new features behind flags. A new feature shipped to 5% of users could be returning 30% errors and the service-level SLO will barely move. The fix is per-flag SLOs during the rollout window: each new feature has its own SLO until it has graduated.
What per-flag SLOs look like in practice:
- Per-flag SLO during ramp.: When a feature is rolled out behind a flag, the requests served by that flagged code path are tagged with the flag identifier. The SLO calculation runs separately for those requests. The new feature has its own availability, latency, and error-rate numbers.
- Catches feature-specific issues fast.: A new feature returning 30% errors at 5% rollout would barely move the service SLO (1.5% global error contribution). The per-flag SLO sees 30% error rate immediately. The team gets the signal they need to roll back or fix.
- Same dimensions as service SLO.: The per-flag SLO measures the same things as the service SLO: availability, latency, error rate, freshness. The difference is the scope. The mechanism is the same metric pipeline tagged with flag attribution.
- Burn-rate alerts on the flagged path.: When the feature is in ramp, alerts fire on the flagged path's burn rate, not on the global service SLO. This catches issues that would never trigger a service-level alert because the affected request volume is too small.
- Pre-defined SLO target before rollout.: The team sets the per-flag SLO target before flipping the flag on. Usually equal to the service SLO. Setting it ahead of rollout means the success criteria are explicit, not improvised after the fact.
Per-flag SLOs are the missing piece that makes feature flag rollouts safe at scale. Without them, the only signal teams have is global metrics that drown out the new code path's specific behavior.
Retire
Per-flag SLOs are temporary. They exist for the rollout window, not forever. Once the feature has graduated to full traffic and the flag is being retired, the flag-specific SLO retires with it. Carrying flag-specific SLOs forever produces dashboard clutter and false signals.
- After full rollout, retire flag-specific SLO.: When the flag is at 100% and stable, the flagged code path becomes the regular code path. The SLO calculation that was tagged by flag attribution merges back into the service SLO. No more separate dashboard tile.
- Merge into service SLO definition.: The behavior the flag was protecting is now part of the service. The service SLO covers it as part of normal traffic. The dimensions and targets do not change; the scope does.
- Track the retirement.: Just like flag retirement, the SLO retirement is a deliberate event. Documented, dated, owner notified. Without the discipline, retired flags leave behind orphaned SLO calculations that quietly run against zero data.
- Same lifecycle as the flag.: The per-flag SLO has a lifecycle: born when the flag is created, active during ramp, retired when the flag is full or retired. The lifecycle aligns with the flag itself. The two artifacts are sibling components of the same feature.
- Catalog of retired flags.: The retro on the rollout (what the per-flag SLO showed during ramp, whether the targets were hit, what was learned) is captured. The catalog of retired flags becomes a knowledge base on what kinds of changes hit which kinds of issues during ramp.
The retirement discipline is what keeps per-flag SLOs from becoming the next generation of dashboard sprawl. Each flag's SLO has a beginning, a useful period, and an end.
Benefit
Per-flag SLOs change how teams ship new features. The benefits compound across the engineering organization.
- Faster feature feedback.: Issues with new code paths surface within minutes rather than hours. The flag rollout that would have caused a service-level incident at 50% traffic gets caught at 5% with the per-flag SLO. Mean time to detect drops dramatically for feature-specific bugs.
- Per-feature accountability.: The team that shipped the flagged feature owns its SLO during ramp. They cannot point at the service-level SLO being healthy as cover for their feature's bad behavior. The accountability is directly tied to the change.
- Confidence to ship faster.: When teams know the per-flag SLO will catch problems, they roll out features more aggressively. The detection speed is higher, so the rollout speed can be too. Aggregate feature velocity goes up.
- Better data for retros.: When a flag rollout has problems, the per-flag SLO data is the postmortem evidence. Specific numbers about what error rate the new code produced, how it compared to the old code, what the rollout schedule looked like.
- Catches subtle regressions.: Some new features do not break things obviously; they just make them slightly worse. A new code path with 200 ms higher p99 latency would be invisible in the service SLO at 5% rollout but very visible in the per-flag SLO. Subtle regressions stop graduating quietly.
Per-flag SLOs are one of the highest-leverage observability investments a team can make for any product that ships behind feature flags. Nova AI Ops integrates with feature flag providers (LaunchDarkly, Unleash, Statsig) to tag traffic by flag, computes per-flag SLOs during the rollout window, and retires the calculation when the flag itself retires so the dashboard stays focused on what is actually rolling out today.