The Dark Launch Validation Pattern
Run the new code in production without exposing it to users. The pattern, the metrics, and what dark launches have caught before real launches.
The shape
A dark launch runs the new code path against real production traffic without exposing the result to users. The old path keeps serving; the new path runs in the shadow.
- Dual-write traffic. Production requests hit both the old and the new code path simultaneously.
- Output captured, not returned. The new code's response is logged and compared, never sent to the user.
- Compare outputs. Where new and old disagree, investigate; that diff is the bug surface.
- Real capacity. The new code handles real load before any real customer depends on it; capacity surprises surface here.
What to compare
Four metrics turn shadow traffic into a quantitative go/no-go decision. Without them, dark launch becomes vibes.
- Output diffs. Disagreement rate between new and old; above 1% usually means a real bug, not noise.
- Latency. p99 in particular; if the new code blows the latency budget under shadow load, it will under real load.
- Error rate. Is the new code throwing more 5xx than the old; absolute count plus rate.
- Cost. Per-request CPU, memory, and external API spend; cheaper is a feature, more expensive is a budget conversation.
What dark launches catch
Three classes of bug only appear in production traffic. Dark launch surfaces them while the old code still serves.
- Real-input edge cases. Production has odd inputs that staging never had; nulls, unicode, empty arrays, ancient client versions.
- Performance under real load. Synthetic load rarely matches real distributions; tail latency only appears with the real shape.
- Downstream integration bugs. Some bugs only manifest at production scale and concurrency, not in unit tests.
- Hidden retries. Clients retrying on the old behaviour reveal that the new behaviour breaks an unspoken contract.