AI Safety & Governance

When the error budget is gone,
risky actions stop happening

Error Budget Gate is the policy layer between the agents and the production cluster. When an SLO is over budget, the gate blocks risky action classes (deploys, schema changes, scale-downs) until the budget recovers. No more agents shipping fixes during the same hour the SLO is on fire.

Get Started Talk to Sales

app.novaaiops.com / error-budget-gate

● LIVE

Gate status · 4 services

paymentsover budget · gate closed

blockeddeploys, schema, scale-down

orders42% left · partial

blockedschema only

identity98% left · open

blockednone

Three Gate States

Open, partial, closed, matched to budget health

Each service's gate has three states. Open: SLO is healthy, agents act normally. Partial: budget burning faster than target, only highest-risk classes blocked. Closed: budget exhausted, all risky classes blocked, only emergency-override can act.

✓
Open: > 50% budget remaining, agents and humans act normally, no friction
✓
Partial: between 0% and 50%, risky classes (deploys, schema, scale-down) blocked, low-risk classes still allowed
✓
Closed: budget exhausted, all risky classes blocked, only the two-person override can act

app.novaaiops.com / error-budget-gate · states

State machine

enteropen · budget > 50%

→ partialbudget 50% → 0% · risky classes blocked

→ closedbudget = 0% · all risky blocked

→ openbudget recovers via window roll-off

Action Class Mapping

You decide which classes the gate guards

Define which agent action classes are gated by which SLOs. Default mapping covers the common cases (deploys gated by latency SLO, schema changes gated by availability SLO). Add your own mappings for product-specific classes (e.g., a "marketing-blast" class gated by your email-deliverability SLO).

✓
Sensible defaults: deploys → latency, schema → availability, scale-down → saturation, works on day one
✓
Per-service override: override the mapping for tier-0 services where you want stricter gating
✓
Custom action classes: register your own classes (e.g., "marketing-blast", "feature-flag-flip") and pick the SLO that gates them

app.novaaiops.com / error-budget-gate · mapping

Mapping · payments

deploy→ p95 latency SLO

schema-migration→ availability SLO

scale-down→ saturation SLO

secret-rotate→ availability SLO

marketing-blast→ email-success SLO (custom)

Override Path

When you really do need to deploy through a gate

Sometimes the right thing is to deploy through a closed gate, say, a fix that you believe will recover the SLO. The override path requires two-person approval (engineer + team-lead, or two-person on-call), a written justification, and writes the override to Agent Ledger. Auditable, not bureaucratic.

✓
Two-person approval: override is the only way through a closed gate, and requires two distinct signers
✓
Written justification: override row in the ledger has a free-text reason and links to the proposed change
✓
Auto-page if it makes things worse: if the override pushes the SLO further from target, on-call is paged automatically

app.novaaiops.com / error-budget-gate · override

Override · req-3201

servicepayments

actiondeploy · payments@e6d2a9

reasonfix for inc-4821

signer 1marc

signer 2sarah

auto-page on regressionarmed

Reporting

See how often the gate fires and why

The gate produces a weekly report: which services hit partial, which hit closed, how many overrides happened, who signed them, and whether the gate caused or prevented any incidents. Use the report to tune your SLO targets, a gate that closes every week is too tight, a gate that never closes is too loose.

✓
Per-service trend: gate state over time per service so you can see whether your SLO targets are realistic
✓
Override audit: every override with signers, reason, and downstream incident count is shown together
✓
Tuning recommendation: when a gate closes > 10% of the week, Nova suggests a target review

app.novaaiops.com / error-budget-gate · report

Weekly · 4 services

opens

partials

closeds

overrides

most-gated servicepayments (3 closes this month)

tuning hintpayments target may be too tight

Video walkthrough coming soon

Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.

Reliability policy as code, enforced in real time

Stop relying on culture to enforce error budgets. The gate makes the policy automatic, while leaving an emergency override for when you need one.

Get Started Request a Demo

When the error budget is gone,risky actions stop happening