Agentic SRE Advanced By Samson Tanimawo, PhD Published Jun 6, 2026 5 min read

Latency Budgets for Production Agents

Triage agents should respond in seconds. Remediation in minutes. Postmortem in hours. The latency budget per agent type and how to enforce it without hurting quality.

Default budgets by role

Triage: p95 < 6 seconds. The on-call is waiting; long latency erodes the value proposition.

Investigation: p95 < 60 seconds. Acceptable when the agent is doing iterative reasoning.

Postmortem drafting: p95 < 5 minutes. Nobody is waiting; quality matters more than speed.

Audit reports: p95 < 1 hour. Background workload; the operator wants results next morning.

Enforcing the budget

Hard timeout per agent role. Triage agents are killed at 30 seconds (5x p95). The hard timeout is the safety net for the SLO.

Soft warning at p95: log a slow-run event. The slow runs go to a debug queue.

Per-step latency caps: a tool call that hangs should not consume the whole run's budget. Each tool has its own timeout.

What to optimise when budgets miss

Prompt size: smaller prompts produce faster responses. Trim unused context.

Cache hits: prompt caching cuts cold-cache latency dramatically. Verify cache hit rate.

Tool latency: a slow tool call dominates run latency. Optimise the slowest tool first.

Model choice: a smaller model often produces good-enough quality at much lower latency. Re-evaluate periodically.

Track budgets in production

Per-agent latency dashboard with p50/p95/p99 over time. Trend lines over weeks tell the story.

Per-step latency breakdown. Helps identify the bottleneck step in slow runs.

Per-tool latency. Tools degrade silently; the dashboard catches it.

When to trade quality for latency

Triage: yes. The on-call cannot wait; a 90% accurate triage in 4 seconds is more valuable than a 95% accurate one in 12 seconds.

Postmortem: no. Wait the extra time for higher quality.

The trade is product-specific. Document it. "Triage agent prefers latency over the last 5% of accuracy" is a written decision, not a default.