MTTD: The Metric Behind MTTR
MTTR includes detection time. Lowering MTTD often lowers MTTR more than lowering response time.
MTTD
Mean time to detect is the gap between when an issue starts affecting users and when monitoring tells someone about it. In most postmortems it is the largest component of total time-to-resolve, and investment here pays back faster than investment in any other phase.
- Issue start to detection. The unobserved-degradation window. Often longer than the response time itself.
- Largest component of MTTR. The dominant slice of total time-to-resolve. Most MTTR variance comes from MTTD variance.
- Explicit measurement per incident. Issue start captured separately from alert-fire time. Without the split, hidden detection delays disappear into “response time”.
- Issue-start source from monitoring. Retroactive examination of the underlying metric. Pulled from data, not memory of when the page fired.
Improve
MTTD improvement comes from earlier-firing alerts and broader coverage. Wins compound across hundreds of incidents because the same detection latency repeats every time the same class of fault recurs.
- Tighter alert thresholds. Per-service review of trigger points. Catches degradation closer to its actual start.
- Synthetic monitoring coverage. Active probes against critical paths. Surfaces issues before customers report them.
- Earlier-firing thresholds. Trade some false-positive cost for earlier detection on the highest-impact alerts.
- Quarterly missed-detection review. Customer-reported-first incidents become the input list. The gaps stop hiding.
Track
Tracking MTTD requires capturing both timestamps for every incident. The gap between them is the optimisation target; without the data, no investment can be justified.
- Issue start versus alert fire. Two timestamps captured per incident. Their difference is MTTD.
- Gap is the optimisation target. Documented per-incident gap drives the investment case for earlier detection.
- Quarterly MTTD trend chart. Trajectory over time. Catches detection degrading silently when alert quality erodes.
- Per-service MTTD breakdown. Worst-detection services surface the targets for alerting investment.