Back to glossary
GLOSSARY · M

MTTR (Mean Time to Resolve)

The average time from an incident starting to the service being restored to normal, the headline reliability metric for SRE teams.

Definition

MTTR, Mean Time to Resolve (sometimes Repair, sometimes Recover), is the average elapsed time between an incident starting and the service being restored to its normal state. MTTR includes detection time (MTTD), acknowledgment time, diagnosis time, action time, and verification time. Each phase is independently optimizable: faster monitoring, smaller on-call rotations, better runbooks, agent-driven remediation, post-fix smoke tests.

Why it matters

MTTR is the headline number on every SRE dashboard because it directly maps to user pain. A 47-minute MTTR means 47 minutes of customers seeing errors, 47 minutes of on-call engineers in war rooms, 47 minutes of reputation damage. Industry-leading MTTR for routine pages is now under 10 minutes, and Agentic SRE platforms are pushing it under 3 for known patterns.

How Nova handles it

See the part of the platform that handles mttr (mean time to resolve) in production.

Nova MTTR benchmarks