GLOSSARY · O

Outage

A period during which a service fails to meet its availability SLO, the event that triggers incident response.

Definition

An outage is a period when a service is failing to meet its availability target, either fully unavailable or sufficiently degraded to count against the error budget. Outages are scoped (Sev-1 = entire service down for everyone, Sev-3 = one feature degraded for one tenant), recorded with start/end timestamps, and accounted against the SLO. Each outage produces a postmortem; recurring outages drive the action items engineering invests in next.

Why it matters

How a team accounts for outages defines its reliability culture: rigorous teams record every minute of degradation against SLO, sloppy teams round down to zero. The rigorous accounting is what makes the error budget a usable forcing function, the sloppy version turns it into theater. Mature teams publish public status pages with their own outage history because nothing enforces the discipline like sunlight.

How Nova handles it

See the part of the platform that handles outage in production.

Nova reliability snapshot

Outage

Definition

Why it matters

How Nova handles it

Related terms