SLO & Reliability Practical By Samson Tanimawo, PhD Published Sep 16, 2025 4 min read

Error Budget vs Resource Quotas

Both bound something. Different.

Error budget

An error budget is the inverse of an SLO. If you commit to 99.9% availability over 28 days, your error budget is the 0.1% of time you are allowed to be unavailable, which works out to roughly 40 minutes. The budget is not a permission to fail. It is a unit of currency that lets engineering and product talk about reliability investment with the same precision they use for everything else.

What makes an error budget different from any other metric:

The error budget is the most important number in an SRE practice that is actually working. It bridges the gap between "we want it reliable" and "we are investing in reliability."

Quota

A resource quota is a hard ceiling on how much of something a service or tenant is allowed to consume at any moment. CPU, memory, requests per second, concurrent connections, queue depth. Quotas exist for a different reason than budgets: they prevent one consumer from starving others or melting shared infrastructure.

Quotas are blunt and necessary. Without them, a single bad consumer can take down a shared service for everyone. With them, each consumer has predictable boundaries and the platform survives the worst case.

Layer

Error budgets and resource quotas are not alternatives. They live at different layers and answer different questions, and a serious operational practice uses both.

Quotas keep the system standing up. Error budgets keep the team aimed at the right reliability target. Treating them as the same thing is how teams end up with a great capacity story and a missed SLO. Nova AI Ops tracks both layers in one view: per-tenant quota saturation alongside per-service error budget burn, with alerting tuned to surface the pattern where the two cross over (a clipped quota that burns budget) before it becomes a customer escalation.