Beginner By Samson Tanimawo, PhD Published Sep 30, 2026 4 min read

SLO Math Cheat Sheet

Every number you need to defend an SLO target in a meeting, without opening a calculator and without saying "I'll get back to you."

Availability to downtime

Memorise this table. The single most common SLO conversation is "what does N nines actually mean?" and the answer should be instant.

99% (two nines), 7h 12m/month, 3.65d/year, 1h 40m/week
99.5%, 3h 36m/month, 1.83d/year, 50m/week
99.9% (three nines), 43.2 min/month, 8.77h/year, 10.1 min/week
99.95%, 21.6 min/month, 4.38h/year, 5.04 min/week
99.99% (four nines), 4.32 min/month, 52.6 min/year, 1.01 min/week
99.995%, 2.16 min/month, 26.3 min/year
99.999% (five nines), 25.9 sec/month, 5.26 min/year

Formula: downtime = (1 - SLO) × window. A 30-day month is 43,200 minutes; a year is 525,600.

Error budget

The budget is what you're allowed to spend on outages, deploys, experiments. Spend it wisely or freeze releases.

Budget = 1 - SLO. 99.9% SLO → 0.1% budget
Allowed bad events = budget × total events. 0.1% × 10M requests = 10,000 errors allowed/window
Budget remaining = 1 - (errors_observed / errors_allowed)
Budget consumed % = (1 - actual_SLI) / (1 - SLO_target)
If consumed > 100%, you've blown the budget, stop pushing risky changes
If consumed < 25% with two weeks left, you're being too conservative, ship more

Burn rate

Burn rate is "how many times faster than allowed are you spending the budget right now?" A burn rate of 1.0 spends the entire budget exactly over the SLO window. Higher = faster.

Burn rate = (error_rate_now) / (1 - SLO_target)
Burn rate 1, exhausts budget exactly at the end of the window
Burn rate 14.4, exhausts a 30-day budget in 50 hours (Google SRE's classic fast-burn threshold)
Burn rate 36, exhausts in 20 hours
Burn rate 720, exhausts in 1 hour (the page-now value)
Multi-window: alert when 1h burn > 14.4 and 5m burn > 14.4 (cuts false pages)
Slow-burn: alert when 6h burn > 6 and 30m burn > 6

Composition

Stack services in series and the budgets multiply (so availability drops). Stack in parallel with a load balancer and availability climbs.

Series (A then B): SLO_total = SLO_A × SLO_B. Two 99.9% services in series = 99.8%
Parallel (A or B): SLO_total = 1 - (1-SLO_A) × (1-SLO_B). Two 99.9% replicas in parallel = 99.9999%
Three services at 99.9% in series = 99.7%, or 2.16h/month downtime
Want 99.99% end-to-end on a 5-service path? Each service needs ~99.998%
Always set the SLO at the user-visible boundary, not per-service. Per-service targets are budgets, not promises

Latency SLOs

Availability isn't enough, slow is broken. Latency SLOs read like "95% of requests under 250ms over 30 days."

Good event = request with latency ≤ threshold. Bad event = anything slower or any error
SLI = good / total. SLO is the target percentage
Two-threshold SLO: 95% < 250ms and 99% < 1000ms, catches both tail latency and the long tail
Avoid SLOs on average latency, averages hide the tail you actually care about
p99 of p99s is not the system's p99, quantiles don't compose. Aggregate from raw histograms