The Dark Launch Validation Pattern
Run the new code in production without exposing it to users. The pattern, the metrics, and what dark launches have caught before real launches.
The shape
Production traffic enters the new code in addition to the old. The new code's output is captured but not returned to users.
Compare new and old outputs. Where they disagree, investigate.
Capacity is real. The new code handles real load before real customers.
What to compare
Output diffs: how often do new and old disagree? Above 1% is usually a bug.
Latency: is the new code within budget? p99 in particular.
Error rate: is the new code throwing more errors than the old?
Cost: is the new code cheaper or more expensive per request?
What dark launches catch
Edge cases that staging never had. Real production has odd inputs that test data does not.
Performance regressions under real load. Synthetic load is rarely realistic.
Integration bugs with downstream services that only manifest at scale.