Back to glossary
GLOSSARY · M

MTBF (Mean Time Between Failures)

The average time a service stays up between failures, an inverse measure of how often it breaks.

Definition

MTBF, Mean Time Between Failures, is the average elapsed time between successive failures of a service or component. If your service had 3 outages in 30 days totaling 4 hours of downtime, your MTBF is roughly (30*24-4)/3 = 238 hours between failures. MTBF is paired with MTTR to describe a service's reliability profile: a service can have a long MTBF but a long MTTR (rare, costly outages) or a short MTBF but a short MTTR (frequent, fast recoveries).

Why it matters

MTBF tells you how often the service breaks; MTTR tells you how long it stays broken. Teams optimize the wrong one in isolation, you can engineer for never-fail and lose all your time to the one outage that does happen, or you can engineer for fast-recovery and accept routine breakage. Tracking both forces an honest tradeoff.

How Nova handles it

See the part of the platform that handles mtbf (mean time between failures) in production.

Nova reliability dashboard