Feature Flag Discipline 2026
Flags accumulate. The discipline.
TTL
The single biggest reason feature flags become a liability is that they have no expiration. A flag rolled out for a 2-week experiment in 2023 is still in the codebase in 2026, still gating a code path, still causing intermittent surprises when someone toggles it without remembering what it does. The cure is non-negotiable: every flag has a death date.
What flag TTL discipline looks like:
- Every flag declares its TTL.: When the flag is created, the owner sets a planned removal date. 30 days for an A/B test. 90 days for a phased rollout. 6 months for a long-running operational kill switch. The TTL is metadata stored in the flag service itself.
- Owner accountable for the death.: The person who created the flag is on the hook for removing it. When the TTL expires, the owner gets paged (Slack, email, ticket) to either remove the flag or explicitly extend it with a new TTL. Silent expiration with no action is not allowed.
- Extension requires justification.: Extending a TTL is allowed but must include a reason and a new target date. "We need this longer" is not a reason; "we are still rolling out, current state is 60% traffic on, expected full rollout in 30 days" is. The extension log makes flags accountable to themselves.
- Permanent flags labeled explicitly.: Some flags are intentionally permanent (kill switches, large-account toggles, regional configs). These get a "permanent" label and a separate review cycle, not a TTL. The label is the discipline; pretending a permanent flag is temporary just for paperwork is the same as having no TTL at all.
The TTL is the forcing function. Without it, every flag added is a permanent addition to the technical debt pile.
Monitor
The TTL discipline only works if someone is actually watching for stale flags. The watch has to be automated, visible, and routine, or it does not happen.
- Stale flags surfaced on a dashboard.: The flag service has a "stale flags" view: every flag past its TTL, ordered by how long it has been overdue. The dashboard is visible to engineering leadership. The number on it is a leading indicator of system hygiene.
- Quarterly cleanup ritual.: Once a quarter, the team that owns the most overdue flags gets time on the roadmap to clean them up. This becomes routine after the first two cycles. Without the ritual, cleanup is always something to do "next sprint."
- Code-level inventory.: Some flags die in the flag service but their callers still exist in code as dead branches. Static analysis or linters catch the orphaned call sites. The flag is not "removed" until the code that referenced it is also removed.
- Per-team metrics.: Number of active flags, average flag age, number of flags overdue, percentage of flags removed within 30 days of TTL. Each is a number per team. Visible. The team with 200 stale flags cannot pretend the problem is somewhere else.
- Notify on expiration.: Two weeks before TTL, the owner gets a friendly nudge. On TTL day, an automatic ticket opens. Two weeks past TTL, the ticket escalates to the owner's manager. The escalation is mechanical and impersonal; it makes cleanup the path of least resistance.
Monitoring is the difference between a flag practice that ages well and one that turns the codebase into a maze of conditional branches whose business meaning has been lost.
Limit
The strongest discipline is a hard cap on how many stale flags a team is allowed to have. A cap forces cleanup to be routine; without one, cleanup is always optional and always loses to feature work.
- Max stale flags per team.: Pick a reasonable cap (5, 10, 25, depending on team size). Once the team's stale flag count hits the cap, no new flags can be added until an old one is removed. The constraint is automated by the flag service.
- Cap encourages cleanup.: When adding a flag is gated by removing one, the team naturally maintains their flag inventory. The hard cap removes the option to defer; the only path forward is cleanup.
- Cap can be raised, but visibly.: Sometimes a team has a legitimate need for more flags (a major feature with multiple gates). Raising the cap requires a written justification and a target date for return to baseline. The exception is documented and reviewed.
- Healthy norm: low single digits.: A mature team running a feature flag practice well has 3 to 7 active flags at any time. Anything above 25 is a sign that flags are being added but not retired. The norm becomes the team's working rhythm.
- Reset on team boundary changes.: When a service moves between teams, the flag inventory and the cap move with it. New owner inherits the count and the obligation. This prevents the pattern where a team accumulates flags, then transfers ownership to a team that did not create them.
Feature flags are powerful and dangerous in equal measure. The discipline of TTL, monitoring, and a hard cap is what keeps the power and contains the danger. Nova AI Ops watches flag age, surfaces overdue cleanups, tracks per-team flag counts against the cap, and pings owners when their flags pass TTL so the discipline runs itself instead of being a quarterly emergency.