The closed-loop action where a system detects an issue and resolves it without human intervention, within a policy envelope.
Auto-remediation is the closed-loop step where a monitoring or agentic system not only detects an issue but also takes the corrective action: scaling a deployment, restarting a pod, rotating an IAM key, draining a queue, failing over a database. The action is executed inside a policy envelope (allowed actions, blast-radius limits, reversibility requirement) so a wrong call cannot make the situation worse. Trust scores accumulate per action type, the agent earns more autonomy as it builds a track record.
Detection without action just shifts work to the on-call queue. Auto-remediation is what compresses MTTR from minutes to seconds for the routine 60-80% of pages, the agent fires the same kubectl scale or aws iam rotate the human would have, just faster and at any hour. Pair it with explicit human approval for high-blast-radius actions and the team gets speed without sacrificing safety.
See the part of the platform that handles auto-remediation in production.