AI Context Optimization is where you tune the AI plumbing without touching agent behavior. Token usage per agent, prompt-cache hit rate, model routing recommendations, context-trim opportunities. Implementing the recommendations typically cuts cost 35-60% and latency 20-40%, with no observable change in agent quality.
System prompts and few-shot examples rarely change. Marking them as cacheable (cache_control on the API) means subsequent calls cost ~10% of fresh calls. The page reports cache hit rate per agent and recommends agents that should enable caching but have not.
Not every task needs Opus. Classify tasks ("which team owns this alert?") run great on Haiku at ~10% the cost. The page reports per-class quality metrics across models so you can see "log-triager on Haiku has 96% of Opus quality at 8% of the cost." Routing recommendations are concrete and conservative, no quality loss is the rule.
Many prompts include more context than they need. The trimmer analyzes which context items are actually cited in the agent's reasoning and recommends dropping items with < 5% citation rate. Trimming cuts tokens directly. As long as cited items stay, agent behavior does not change.
Routine work that does not need realtime response (nightly summaries, weekly digests, scheduled audits) should use the Batch API, which is ~50% cheaper than realtime. The page lists scheduled jobs and recommends which ones can move to batch. Implementing typically saves another 8-12% on top of caching and routing wins.
Subscribe to Nova AI Ops on YouTube for demos, tutorials, and feature deep-dives.
Cut AI cost by 50% without dumbing down a single agent. Caching, routing, trimming, applied where it works.