AI Safety & Governance

When the error budget is gone,
risky actions stop happening

Error Budget Gate is the policy layer between the agents and the production cluster. When an SLO is over budget, the gate blocks risky action classes (deploys, schema changes, scale-downs) until the budget recovers. No more agents shipping fixes during the same hour the SLO is on fire.

Get Started Talk to Sales
app.novaaiops.com / error-budget-gate
● LIVE
3
Gate states (open, partial, closed)
< 1s
Decision latency
Per-SLO
configurable rules
Override
with two-person approval
Three Gate States

Open, partial, closed, matched to budget health

Each service's gate has three states. Open: SLO is healthy, agents act normally. Partial: budget burning faster than target, only highest-risk classes blocked. Closed: budget exhausted, all risky classes blocked, only emergency-override can act.

  • Open: > 50% budget remaining, agents and humans act normally, no friction
  • Partial: between 0% and 50%, risky classes (deploys, schema, scale-down) blocked, low-risk classes still allowed
  • Closed: budget exhausted, all risky classes blocked, only the two-person override can act
app.novaaiops.com / error-budget-gate · states
Action Class Mapping

You decide which classes the gate guards

Define which agent action classes are gated by which SLOs. Default mapping covers the common cases (deploys gated by latency SLO, schema changes gated by availability SLO). Add your own mappings for product-specific classes (e.g., a "marketing-blast" class gated by your email-deliverability SLO).

  • Sensible defaults: deploys → latency, schema → availability, scale-down → saturation, works on day one
  • Per-service override: override the mapping for tier-0 services where you want stricter gating
  • Custom action classes: register your own classes (e.g., "marketing-blast", "feature-flag-flip") and pick the SLO that gates them
app.novaaiops.com / error-budget-gate · mapping
Override Path

When you really do need to deploy through a gate

Sometimes the right thing is to deploy through a closed gate, say, a fix that you believe will recover the SLO. The override path requires two-person approval (engineer + team-lead, or two-person on-call), a written justification, and writes the override to Agent Ledger. Auditable, not bureaucratic.

  • Two-person approval: override is the only way through a closed gate, and requires two distinct signers
  • Written justification: override row in the ledger has a free-text reason and links to the proposed change
  • Auto-page if it makes things worse: if the override pushes the SLO further from target, on-call is paged automatically
app.novaaiops.com / error-budget-gate · override
Reporting

See how often the gate fires and why

The gate produces a weekly report: which services hit partial, which hit closed, how many overrides happened, who signed them, and whether the gate caused or prevented any incidents. Use the report to tune your SLO targets, a gate that closes every week is too tight, a gate that never closes is too loose.

  • Per-service trend: gate state over time per service so you can see whether your SLO targets are realistic
  • Override audit: every override with signers, reason, and downstream incident count is shown together
  • Tuning recommendation: when a gate closes > 10% of the week, Nova suggests a target review
app.novaaiops.com / error-budget-gate · report
Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Reliability policy as code, enforced in real time

Stop relying on culture to enforce error budgets. The gate makes the policy automatic, while leaving an emergency override for when you need one.

Get Started Request a Demo