SRE Agent Guardrails: A Defense-in-Depth Checklist
Eleven independent guardrails, each with a different failure model. The checklist, what each catches, and the order to add them as your agent matures.
The 11 layers
1. Read-only by default. 2. Tool allowlist. 3. Pre-flight checks. 4. Two-person approval for risky actions. 5. Action caps per run / per service / per tenant.
6. Stagger between actions. 7. Sandbox-first for irreversible actions. 8. Loop detection. 9. Cost budget enforcement. 10. Confidence threshold for action.
11. Audit log. The 11th is least exciting and most important; it is what lets you debug, learn from, and prove anything about the other 10.
Each layer covers different failures
Layer 1 prevents whole categories of harm by limiting what the agent can do.
Layer 4 catches the cases where layers 2 and 3 missed something subtle.
Layer 11 is the only layer that helps after a failure; the others are preventive.
Order to add them
Day one: layers 1 (read-only), 2 (tool allowlist), 11 (audit log). These three give you a safe agent that does nothing.
Week 2-4: layers 3 (pre-flight), 5 (caps), 9 (cost budget). Now the agent can act, but bounded.
Month 2-3: layers 4 (approval), 6 (stagger), 7 (sandbox). The agent can take risky actions, but only with explicit gating.
As needed: layers 8 (loop detection), 10 (confidence threshold). Add when production data shows a need.
When 11 is too many
Bounded internal agents (tool that fills a Jira ticket from logs) can drop to 4 or 5 layers. The high-stakes layers are not warranted.
Production-customer-facing agents need all 11. Anything less than the full set has been compromised by experience.
When in doubt, default to all 11. The cost of an extra layer is minor; the cost of a missing layer can be a postmortem.
Testing the layers
Each layer should have at least one eval case that fires it. "This case should trip the cap" — pass if it does, fail if it does not.
Combined cases test layer interactions. "This case requires layer 3 to refuse, then layer 4 to escalate, then layer 11 to record." The agent should pass all three.
Run quarterly red-team exercises. Engineer scenarios specifically designed to slip through the layers; close any gaps found.