Reverting Agent Actions: The Undo Strategy You Need
Agents make mistakes. The undo store, the reversibility classifier, and the human escalation path for actions that cannot be undone automatically.
Classify actions by reversibility
Reversibility is the property that decides everything else. Three classes capture the realistic shapes; the recovery strategy for each is different.
- Class A: trivially reversible. Toggle a flag, restart a pod. Auto-revert is safe; record and undo with one tool call.
- Class B: complex reversible. Roll back a deploy, restore a config. Auto-revert is risky; manual revert with the agent’s recorded delta is preferred.
- Class C: irreversible. Drop a table, send an email, charge a credit card. No revert; prevention is the only strategy.
- Class assignment at action time. Each tool registration declares its class; the agent never picks the class on the fly.
The undo store
The undo store is the artefact that makes revert real. Without it, “undo” is wishful thinking.
- Per-action record. What was changed, what the previous state was, what the change ID was, who did it, when.
- Queryable. “Show me everything the agent changed in the last hour” returns a list with revert tools per row.
- Retention. 30 days for Class A, 90 days for Class B. After that, the change has settled and should not be undone.
- Tamper-evident. Each entry is hash-chained. Revert against a tampered entry refuses; the audit trail must survive a bad day.
The revert UI
The UI matters because reverts happen under stress. Three properties keep the operator from making the wrong move when seconds matter.
- Recent-action table. Each row has a revert button. The button shows what the revert would do; the operator confirms before it fires.
- Bulk-revert gating. “Revert everything in the last hour” is dangerous and requires approval plus a written reason.
- Audited reverts. Reverts are at least as well-logged as the original actions. The audit trail captures both.
- Pre-flight check. Before revert fires, the agent re-fetches current state. If the world drifted, the revert refuses rather than overwriting.
Class C and the escalation path
Class C actions cannot be undone. Recovery is human-led; the agent’s job is to hand the human the right context, fast.
- No revert path. Detection that the action was wrong translates into recovery, not revert.
- Human-led recovery. The agent provides full context: what was done, why, what the consequences look like, what the recovery options are.
- Quarterly tabletop. Practice Class C recovery. “The agent dropped the wrong table; what is your response?” surfaces gaps in playbooks before they hit production.
- Pre-action gating. Class C actions require an explicit human approval at action time, not just at agent-design time. Prevention beats recovery.
Limits of revert as a safety strategy
Revert is one safety mechanism, not the safety mechanism. Three limits show why it cannot stand alone.
- Record vs reality. Revert assumes the agent’s record matches reality. Drift between record and reality breaks revert silently; validate periodically.
- Single-actor assumption. Revert handles single-actor scenarios. When multiple agents and humans have touched the same resource, revert can produce unexpected merges.
- Not a substitute for prevention. Prefer caps, approvals, and sandboxes. Revert is the last resort.
- Time-window decay. The longer the gap between action and revert, the lower the probability that revert restores the intended state. Speed matters.