Best Practices Intermediate By Samson Tanimawo, PhD Published Mar 17, 2026 6 min read

Progressive Delivery: Feature Flags Beyond On/Off

A feature flag that flips for everyone at once is just a slow deploy. Real progressive delivery splits a release into five or six controlled steps where each step has its own kill switch.

From flip to rollout

The first feature flag everyone writes is a boolean: on or off. The second is targeted to a few internal users. After that the patterns multiply, and each one catches a different class of bug.

Progressive delivery is the discipline of taking a binary release and turning it into a controlled sequence of expansions, each one a chance to detect a problem before the blast radius gets big. The boolean flag is the most basic form; mature teams compose multiple flag types into release plans that look more like a runbook than a code change.

The reason matters. A simple flip means a regression hits 100% of users immediately. A 5-step rollout means a regression hits at most one step's worth before someone catches it. The math is the same as canary deployments — exposure is bounded by the smallest active stage. The infrastructure to run this exists in every modern feature-flag service; the discipline of using it is the gap.

Percentage rollouts

1%, 5%, 25%, 50%, 100%. The most common pattern, and the most common one done badly. The trick is not the percentages; it is the dwell time. Watching error rates for 10 minutes between steps catches almost nothing. Watch for at least one full request-pattern cycle (typically 1-2 hours).

Why dwell time matters. Many bugs only manifest under specific load patterns: peak hour, batch job kicking off, cache eviction cycle. A 10-minute window misses all of these. Two hours catches the major patterns; 24 hours catches the once-a-day patterns. The percentage steps are cheap; the dwell times are where reliability is built or lost.

The leverage move: tie dwell time to a metric, not a clock. "Move to next step when error rate has been within 0.05% of baseline for 1 hour" is more rigorous than "wait 1 hour." Most modern flag platforms support this; the team that wires it up gets automatic protection against regressions that take hours to surface.

Cohort rollouts

Internal users, then beta users, then by tenant size, then everyone. Cohort rollouts catch issues that percentage rollouts miss because some user types simply do not exercise the new code path until you give it to them. A 5% rollout that excludes your enterprise tenants is a 0% rollout for the only customers who would actually break.

The cohort hierarchy matters. Internal users have the most tolerance for bugs and the fastest feedback loop (Slack the engineer directly). Beta users have moderate tolerance and have signed up for early access. Free-tier users are next, then paid-tier, then enterprise. Each tier has different stakes; each catches different bug classes.

The trap. Skipping the enterprise cohort because it's the smallest. Enterprise users often have the heaviest customisation and the most edge cases. They're the ones whose specific data shape will break the new code path. Always include enterprise as a deliberate cohort step before 100%, even if the percentage is tiny.

Geographic rollouts

Roll one region at a time. Slower, but catches the failures specific to a region: latency to a regional database, locale-specific input, regulatory edge cases. Geographic rollouts pair well with cohort: us-east internal first, then us-east beta, etc.

The latency-related bugs are the surprising ones. A feature that works in us-east-where-the-database-also-lives can fail in eu-west when the cross-region latency exceeds the request timeout. Pure percentage rollouts pick random users globally and may not exercise this path at all; geographic rollouts make it impossible to miss.

Regulatory edge cases. EU data residency rules, US payment processing rules, APAC data localisation. A feature that processes user data in a way that's fine in one region may be illegal in another. Geographic rollout exposes this before the regulator does.

Dependency-gated rollouts

The new feature stays off until its dependencies (a new index, a migrated table, a new microservice) are at version X. Without this, a 1% percentage rollout fires before the database it depends on has finished migrating, and you spend an hour debugging a race that never had to exist.

The pattern: the flag's "enabled" check is a logical AND of the percentage rollout AND the dependency check. The dependency check is automated — it queries an API or reads a config to verify the dependency is ready. The flag stays "off" for any user where the dependency isn't ready, even if the percentage rolled them in.

Where this matters most. Database migrations that take days to complete on large tables. Microservice rollouts that are themselves multi-step. Feature flags whose backend service is being deployed alongside. Each is a case where the simple percentage rollout creates ordering races; dependency gating eliminates them.

The order to roll them out

Start with percentage. Add cohort second (it costs you a row in your database; nothing else). Geographic third when your traffic is genuinely multi-region. Dependency-gated last, when you have enough infrastructure changes to need it.

The reasoning. Percentage is the easiest to implement and gets you the most basic protection. Cohort is cheap to add but requires user-segment logic; only worth it once you have user types that genuinely differ. Geographic requires regional infrastructure; meaningless for single-region teams. Dependency-gated is complex; only worth it when you have multi-step releases.

Adopting in order keeps the team's progressive-delivery muscle building. The first month, percentage rollouts catch most regressions. The second quarter, the team adds cohort and catches enterprise-specific issues. By year-end, all four patterns are routine. Trying to adopt all four at once produces a complex system that no one understands.

Halting cleanly

Every step needs a kill switch that returns to the previous step instantly. If your kill switch requires a deploy, you do not have a kill switch. You have a slower deploy.

The discipline: the kill switch is the SAME flag, set to a SMALLER percentage. Going from 25% back to 5% should be a config change in the flag platform, not a code change in the application. Most modern flag services make this trivial; teams that build their own often skip the instant-rollback feature and pay for it later.

The naming convention. Each flag has a current step (5%, 25%, 50%, 100%). Going backwards is "rolling back" or "halting." Going forwards is "advancing" or "rolling out." The verbs are different; the engineering discipline is to know which direction you're moving and why.

Common antipatterns

The "it's just a flag" mindset. Engineers ship without thinking about the rollout plan, then deal with regressions in production. The flag is leverage, not magic. Plan the rollout before merging.

Stale flags. Feature shipped at 100%, flag never deleted. Six months later the flag is undocumented technical debt. Every rollout should have a "remove the flag" task scheduled for after 100%.

Nested flags that depend on each other. Flag A controls feature; flag B controls a sub-component of feature; flag C controls a sub-sub-component. Three levels of flags creates 8 possible states; testing all 8 is impossible. Flatten the hierarchy.

Flags as long-term feature gates. "We'll keep this behind a flag for premium tier only." Now the flag is part of the product surface; it requires the same testing as code. Flags are for rollouts, not for permanent feature differentiation.

What to do this week

Three moves. (1) For the next non-trivial feature, write the rollout plan before merging the code. Plan should specify percentage steps, dwell times per step, the cohort sequence if applicable, and the rollback criteria. (2) Audit your current flag inventory. Anything past 100% rollout for over 30 days should have a removal ticket. (3) Set a team norm: kill-switch tested in staging before any rollout enters production. The team that confirms the rollback works before they need it doesn't discover at 3am that the rollback path was broken.