Error Budget vs Availability: Stop Confusing Them
Availability is the SLO; error budget is what you spend within it. The distinction matters; the conflation produces bad decisions.
Definitions
Error budget and availability are related concepts that are often conflated. Availability is the target; error budget is what is left below the target. The distinction matters because they support different conversations and decisions; conflating them produces muddled thinking.
What the definitions are:
- Availability: the target.: The team commits to 99.9% uptime (or whatever the SLO is). The target is the policy; the team's reliability commitment.
- 99.9% uptime, for example.: Specific SLO targets vary by service. The number expresses how reliable the service should be; the customers and the business reference this number.
- Error budget: what you have to spend below the target.: The error budget is the inverse. 99.9% availability allows 0.1% downtime; 0.1% of a month is 43 minutes. The budget is the allowed unreliability.
- 0.1% times 30 days equals 43 minutes per month.: The math translates the percentage into time. The team can afford 43 minutes of downtime per month while staying within SLO.
- The two are complementary.: Availability is the target; the budget is the allowance. Together they describe the team's reliability commitment fully.
The definitions are simple; the distinction matters for how they are used.
How they are used
Availability and error budget support different conversations. Availability is for customers; the budget is for engineering. The audiences differ; the language differs.
- Availability is the customer-facing commitment.: Customers care about availability. Contracts reference it; SLAs include it; customer reports show it. The customer's perspective is the availability number.
- Error budget is the engineering trade-off currency.: The team's internal discussions use the budget. Should we ship this risky change? Do we have budget for the experiment? The budget is the currency the team trades in.
- Spend on innovation.: The budget allows risk-taking. New features, bold experiments, deployment automation all consume budget. The team accepts some unreliability to enable the work.
- Save for stability.: When the budget is tight, the team reduces risk. Slower deploys, more conservative changes, focus on stability work. The budget level guides the engineering posture.
- Different audiences need different metrics.: The customer does not care about budget; the engineer does not optimize for availability directly. The two metrics serve different audiences with different concerns.
The use determines the discussion. Each metric supports the conversation it was designed for.
Conflation pitfall
The conflation produces muddled thinking. Engineers sometimes say "we are at 99.95% availability" as if that is the budget; it is not. Getting the math and language right produces clearer conversations.
- Engineers say "we are at 99.95% availability" as if that is the budget.: The 99.95% is current availability; the budget is something different. Conflating them confuses the conversation.
- It is not; it is the burn.: 99.95% means we are using 0.05% of the budget; the remaining 0.05% is what is left. The budget is the allowance; the burn is what has been consumed.
- The budget is what is left, not what is consumed.: The distinction is foundational. The team's decisions about risk-taking should reference what is left in the budget; what is consumed is history.
- Get the math right.: The math is straightforward but easy to get wrong. The team's tooling should compute and display both: current availability and remaining budget. Conversations reference both correctly.
- The conversations follow.: When the math is right, the conversations are clearer. Engineering trade-offs use the budget; customer reports use availability; nobody is confused; decisions are better.
Error budget vs availability distinction is one of those small clarities that produces large benefits in team conversations. Nova AI Ops integrates with SLO platforms, displays both availability and remaining budget, and supports the conversations that mature SLO operations require.