Error Budget vs Resource Quotas
Both bound something. Different.
Error budget
An error budget is the inverse of an SLO. If you commit to 99.9% availability over 28 days, your error budget is the 0.1% of time you are allowed to be unavailable, which works out to roughly 40 minutes. The budget is not a permission to fail. It is a unit of currency that lets engineering and product talk about reliability investment with the same precision they use for everything else.
What makes an error budget different from any other metric:
- Time-bounded.: The budget refills on a schedule (rolling 28 days, calendar month, quarter). You spend it down with incidents and bad deploys; you accumulate it back through clean operations. This rhythm forces ongoing accountability instead of one-shot post-incident drama.
- User-visible by definition.: The budget tracks failures the user experienced (slow requests, errors, missing data), not internal symptoms. A spike in CPU that does not affect users does not consume budget. A 500 served to a customer always does.
- Spent by deploys, not just incidents.: Every risky change has a cost in expected budget. A canary that pushed a small regression for 6 minutes burns 6 minutes of budget. This puts a price on the move-fast-and-break-things tradeoff.
- Triggers governance, not just alerts.: When budget burns past a threshold, the response is not a page. It is a feature freeze, a reliability sprint, or a renegotiation of the SLO with stakeholders. The budget is the lever that bends investment toward reliability.
The error budget is the most important number in an SRE practice that is actually working. It bridges the gap between "we want it reliable" and "we are investing in reliability."
Quota
A resource quota is a hard ceiling on how much of something a service or tenant is allowed to consume at any moment. CPU, memory, requests per second, concurrent connections, queue depth. Quotas exist for a different reason than budgets: they prevent one consumer from starving others or melting shared infrastructure.
- Concurrent, not cumulative.: A quota is "you can have at most N at once," not "you can have N over a window." When you hit the quota, the system rejects or queues additional work. There is no carry-over.
- Per-tenant or per-service.: Quotas isolate noisy neighbors. Tenant A blowing through their request quota does not affect Tenant B because Tenant B has their own.
- Enforcement is mechanical.: The runtime, the API gateway, or the scheduler enforces the cap directly. There is no human judgment, no escalation. Either the request fits within the quota or it does not.
- Capacity protection, not reliability investment.: Quotas keep your infrastructure from collapsing under unexpected load. They are a defensive primitive, not a planning tool.
Quotas are blunt and necessary. Without them, a single bad consumer can take down a shared service for everyone. With them, each consumer has predictable boundaries and the platform survives the worst case.
Layer
Error budgets and resource quotas are not alternatives. They live at different layers and answer different questions, and a serious operational practice uses both.
- Quotas at the infrastructure layer.: Every service has CPU, memory, RPS, and connection caps that prevent runaway consumption. The cap is set generously enough that normal traffic never bumps into it, tightly enough that abnormal traffic gets clipped before it spreads.
- Budgets at the SLO layer.: Every customer-facing service has an error budget that governs the deploy/freeze decision. Budget burn is the signal that drives "we need to slow down and invest in reliability for two weeks."
- Different audiences.: Quotas are an SRE/platform concern. They show up on capacity dashboards and are tuned by the team that owns the runtime. Budgets are an engineering and product concern. They show up on reliability dashboards and drive prioritization conversations with the PM.
- Different cadences.: Quotas change infrequently (when capacity adds or a tenant's profile shifts). Budgets are watched continuously and reset on the SLO window.
- The signals overlap when they fail.: A quota that is undersized will burn budget every time it clips legitimate traffic. A budget that burns too fast will sometimes reveal a quota that is too generous and let pathological consumers escape. Looking at both together is how you find the misconfigurations.
Quotas keep the system standing up. Error budgets keep the team aimed at the right reliability target. Treating them as the same thing is how teams end up with a great capacity story and a missed SLO. Nova AI Ops tracks both layers in one view: per-tenant quota saturation alongside per-service error budget burn, with alerting tuned to surface the pattern where the two cross over (a clipped quota that burns budget) before it becomes a customer escalation.