Latency Budgets for Production Agents

Triage agents should respond in seconds. Remediation in minutes. Postmortem in hours. The latency budget per agent type and how to enforce it without hurting quality.

Default budgets by role

Latency budgets are role-specific because operator patience varies wildly between “the page just fired” and “the report runs overnight.” Set the budget against the role, not the model.

Enforcing the budget

Without enforcement, budgets are aspirations. The four controls below convert the SLO into actual runtime behaviour.

What to optimise when budgets miss

When p95 drifts above the budget, four levers usually explain it. Pull them in roughly this order; the cheap wins are at the top.

Track budgets in production

Latency degrades silently until it does not. The dashboards below catch the drift weeks before an operator complains.

When to trade quality for latency

The trade is product-specific. Document the chosen direction so the eval team and the prompt team optimise for the same target.