Canary Noise Tolerance
Don't fail canary on minor noise.
Noise floor thresholds
1-2% deviation accepted as noise. Below this, no canary failure.
Per-metric tuning. Latency tolerates more noise than error rate.
Business hours vs off-hours. Off-hours has lower traffic; higher noise.
Statistical significance
Need stat-sig delta to fail canary. Random spikes don't trigger rollback.
Sample size matters. Low-traffic services need longer canary windows.
Confidence intervals: 95% threshold for rollback. Tunable per criticality.
Auto-tuning baselines
Track baseline noise per metric. Tune thresholds based on historical variance.
Detect deviations from learned baseline. Beats fixed thresholds.
Re-baseline on intentional changes. New service version invalidates old baseline.
Manual override
Engineer can promote canary early if confident. Skip remaining stages.
Engineer can fail canary manually if observing issues outside the metrics.
All overrides logged. Quarterly review of override patterns.