SRE Agent Guardrails: A Defense-in-Depth Checklist

Eleven independent guardrails, each with a different failure model. The checklist, what each catches, and the order to add them as your agent matures.

The 11 layers

Defence in depth is the right shape because no single layer catches every failure. The 11 layers below are the canonical set most production agent platforms converge on.

Layers 1-3. Read-only by default, tool allowlist, pre-flight checks. The minimal foundation; an agent without these is unsafe at any scale.
Layers 4-7. Two-person approval for risky actions, action caps per run and per service and per tenant, stagger between actions, sandbox-first for irreversible actions.
Layers 8-10. Loop detection, cost budget enforcement, confidence threshold for action. The layers that protect against the agent’s own pathologies.
Layer 11: audit log. Least exciting, most important. The audit log is what lets you debug, learn from, and prove anything about the other 10.

Each layer covers different failures

The layers are independent on purpose. A failure that slips one layer is caught by the next; a failure that slips all 11 is exceedingly rare.

Layer 1 (read-only) prevents whole categories. An agent that cannot write cannot misbehave through writes.
Layer 4 (approval) catches subtle misses. Cases where layers 2 and 3 allowed something subtle to slip through.
Layer 11 (audit) is post-failure. The only layer that helps after a failure; the rest are preventive.
Independence test. If two layers always fail together on the same eval case, one of them is redundant; the layers are designed not to correlate.

Order to add them

Adding all 11 at once stalls. The order below ships a safe-but-useless agent on day one and adds capability layer by layer with evidence.

Day one. Layers 1, 2, 11. Read-only, tool allowlist, audit log. A safe agent that does nothing visibly useful yet.
Week 2-4. Layers 3, 5, 9. Pre-flight, caps, cost budget. Now the agent can act, but bounded.
Month 2-3. Layers 4, 6, 7. Approval, stagger, sandbox. The agent takes risky actions only with explicit gating.
As needed. Layers 8 (loop detection) and 10 (confidence threshold). Add when production data shows a need; do not preempt.

When 11 is too many

Not every agent needs every layer. The two situations below are the only ones where the full set is overkill.

Bounded internal agents. A tool that fills a Jira ticket from logs can drop to 4 or 5 layers. The high-stakes layers are not warranted at this scope.
Customer-facing production. Needs all 11. Anything less than the full set has been compromised by experience.
Default to all 11. The cost of an extra layer is minor; the cost of a missing layer can be a postmortem.
Drop with evidence. Removing a layer requires production evidence that it has not fired in a quarter. Removing on hunch is how outages enter.

Testing the layers

Untested layers are decoration. The eval set must exercise each layer individually and the layers in combination.

Per-layer eval. Each layer has at least one eval case that fires it. “This case should trip the cap” passes if the cap fires.
Combined cases. Layer interactions. “This case requires layer 3 to refuse, then layer 4 to escalate, then layer 11 to record.” The agent passes all three.
Quarterly red team. Engineer scenarios specifically designed to slip through the layers; close any gaps the exercise surfaces.
Layer-coverage report. Track which layers fired this quarter. Layers that have not fired in two quarters get reviewed for relevance.