Agentic SRE Advanced By Samson Tanimawo, PhD Published Jun 30, 2026 5 min read

The Agent Cost Bomb: Pre-emptive Token Budgets

One stuck agent can burn $400 in an hour. The budget enforcement layer that stops it before it does, plus the alerting that wakes you up if budgets blow up across runs.

Three budget dimensions

Per-run budget: how much can a single invocation cost. Caps the worst case.

Per-tenant budget: how much can a single user or service trigger in a window. Prevents abuse and runaway integrations.

Aggregate budget: how much can the agent fleet spend per day. Catches rare but expensive scenarios that slip per-run caps.

Enforcement layer

Track tokens spent within the loop. Before each model call, check whether the next call would exceed the budget. If so, do not call; abort or escalate.

The check is cheap: a counter and a comparison. Far cheaper than the model call it might prevent.

Cost is computed in real time, not after the run. After-the-fact accounting is for billing; before-the-fact accounting is for safety.

Alerting on budget excursions

Per-run budget hits should be rare but visible. Page on the rate, not on individual events. "More than 1% of runs hit the cap today" is the alert.

Aggregate budget hits should be a hard page. The fleet is misbehaving; someone needs to look immediately.

Per-tenant hits go to a dashboard, reviewed daily. Patterns reveal abuse, runaway integrations, or budget that needs raising for legitimate reasons.

Calibrating the budgets

Start with a sample of normal runs. Take p95 and round up. That is your starting per-run budget.

Watch for the first month. Any run that hits the cap is reviewed. If it should have completed, raise the cap. If it was misbehaving, leave the cap.

Quarterly: re-calibrate against the latest p95. Models get cheaper; budgets should drop. A budget that is the same as 12 months ago is probably loose.

Fail closed on budget exhaustion

When the budget runs out, the agent stops. It does not try harder, it does not skip steps, it does not continue without verification.

The agent escalates to human. The escalation includes the partial state: what was learned before the budget ran out.

Resist the temptation to "just bump the budget" when escalations come in. Each bump is a chance to ask why the budget was inadequate; usually the answer is a prompt that needs work, not a budget that needs raising.