Model Promotion: A Canary Ramp That Works in Production

5%, 25%, 50%, 100%. The ramp that catches regressions before they hit everyone, with the metric thresholds that gate each step.

The ramp

The model promotion ramp is staged. 5% for 24 hours catches loud regressions; 25% for 48 hours catches subtler ones with stat-sig sample sizes; 50% for 48 hours is final validation; 100% promotes the new model with the old one staying warm for 7 days for fast rollback.

Metric gates per stage

Each stage gates on four metrics. Latency p99 cannot regress more than 10% vs the previous model; error rate cannot regress at all; quality (eval score) cannot regress more than 2 percentage points; cost can grow up to 15% without explicit approval.

Aborting the ramp

Any gate failure halts the ramp, fires alerts, and on-call rolls back via one command because the warm previous model takes over instantly. Aborts are loud, and the postmortem documents which gate, what data, and what fix. Most aborts come from latency or cost regressions because quality regressions are subtle while latency and cost are visible.

Eval coverage during ramp

Eval coverage spans the ramp. Pre-ramp: full eval suite passes with no exceptions; during ramp: subset of evals runs hourly on canary traffic and confirms the ramp matches offline eval; post-ramp: full eval suite at 100% with the release documented before and after.