Error Budget Gate is the policy layer between the agents and the production cluster. When an SLO is over budget, the gate blocks risky action classes (deploys, schema changes, scale-downs) until the budget recovers. No more agents shipping fixes during the same hour the SLO is on fire.
Each service's gate has three states. Open: SLO is healthy, agents act normally. Partial: budget burning faster than target, only highest-risk classes blocked. Closed: budget exhausted, all risky classes blocked, only emergency-override can act.
Define which agent action classes are gated by which SLOs. Default mapping covers the common cases (deploys gated by latency SLO, schema changes gated by availability SLO). Add your own mappings for product-specific classes (e.g., a "marketing-blast" class gated by your email-deliverability SLO).
Sometimes the right thing is to deploy through a closed gate, say, a fix that you believe will recover the SLO. The override path requires two-person approval (engineer + team-lead, or two-person on-call), a written justification, and writes the override to Agent Ledger. Auditable, not bureaucratic.
The gate produces a weekly report: which services hit partial, which hit closed, how many overrides happened, who signed them, and whether the gate caused or prevented any incidents. Use the report to tune your SLO targets, a gate that closes every week is too tight, a gate that never closes is too loose.
Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.
Stop relying on culture to enforce error budgets. The gate makes the policy automatic, while leaving an emergency override for when you need one.