GLOSSARY · A

Agentic SRE

An operating model for site reliability where specialized AI agents own the full incident loop, detect, diagnose, decide, remediate, audit, with human policy as the guardrail.

Definition

Agentic SRE is an architecture for site reliability in which specialized AI agents (not dashboards, not a single general LLM) own the full operational loop: detection, diagnosis, decision, remediation, audit, and learning. Each agent has a narrow scope (one runbook pattern, one tool, one signal type), a measurable trust score, and a policy envelope that constrains what it can do without human approval. The architecture differs from AIOps in that AIOps platforms surface signals for humans to act on, while Agentic SRE platforms close the loop themselves on routine work.

Why it matters

Static thresholds and dashboard-driven workflows scale badly past a few hundred services. Agentic SRE compresses MTTR (the time from anomaly to resolution) by removing the slowest step in the loop, the human walking from page to dashboard to runbook to terminal. Teams that adopt it report 60-80% of routine pages closing without a human waking up.

How Nova handles it

See the part of the platform that handles agentic sre in production.

How Nova does Agentic SRE

Agentic SRE

Definition

Why it matters

How Nova handles it

Related terms