AI & ML Practical By Samson Tanimawo, PhD Published Jul 30, 2026 4 min read

Model Promotion: A Canary Ramp That Works in Production

5%, 25%, 50%, 100%. The ramp that catches regressions before they hit everyone, with the metric thresholds that gate each step.

The ramp

Stage 1: 5% of traffic for 24 hours. Catches loud regressions (latency, errors) and obvious quality drops.

Stage 2: 25% for 48 hours. Catches subtler regressions; sample size large enough for stat-sig comparisons.

Stage 3: 50% for 48 hours. Final validation; if metrics hold, promote to 100%.

Stage 4: 100%. The new model is live; old model stays warm for 7 days for rollback.

Metric gates per stage

Latency p99 cannot regress more than 10% vs the previous model.

Error rate cannot regress at all.

Quality (eval score) cannot regress more than 2 percentage points vs the previous model on the standard suite.

Cost can grow up to 15%; beyond that requires explicit approval.

Aborting the ramp

Any gate fails: ramp halts, alerts fire, on-call rolls back. The rollback is one command; the warm previous model takes over.

Aborts are loud. Postmortem follows: which gate, what data, what fix.

Most aborts come from latency or cost regressions, not quality. Quality regressions are subtle; latency and cost are visible.

Eval coverage during ramp

Pre-ramp: full eval suite passes. No exceptions.

During ramp: subset of evals run hourly on canary traffic. Confirms the ramp matches the offline eval.

Post-ramp: full eval suite at 100%. The release is documented with evals before and after.