The average time a service stays up between failures, an inverse measure of how often it breaks.
MTBF, Mean Time Between Failures, is the average elapsed time between successive failures of a service or component. If your service had 3 outages in 30 days totaling 4 hours of downtime, your MTBF is roughly (30*24-4)/3 = 238 hours between failures. MTBF is paired with MTTR to describe a service's reliability profile: a service can have a long MTBF but a long MTTR (rare, costly outages) or a short MTBF but a short MTTR (frequent, fast recoveries).
MTBF tells you how often the service breaks; MTTR tells you how long it stays broken. Teams optimize the wrong one in isolation, you can engineer for never-fail and lose all your time to the one outage that does happen, or you can engineer for fast-recovery and accept routine breakage. Tracking both forces an honest tradeoff.
See the part of the platform that handles mtbf (mean time between failures) in production.