FinOps Intermediate By Samson Tanimawo, PhD Published Nov 6, 2026 9 min read

LLM and GenAI Cost Engineering

GenAI costs are unique; the levers are well-known but rarely all applied. Done right, 70-90% reduction is realistic.

Why GenAI cost is different

Cost scales with token volume × model tier. A bad architecture can 10x the bill at the same user volume.

The optimization opportunities are large.

Four levers

1. Model routing (cheap model for easy tasks).
2. Prompt caching (Anthropic, OpenAI both support).
3. Batch API for non-realtime.
4. Fine-tuning instead of long prompts.

Per-lever savings

Routing: 30-50% savings. Caching: 50-90% on cacheable prompts. Batch: 50% off realtime price. Fine-tuning: model-dependent.

Combined: 70-90% common.

Cost-aware GenAI culture

Engineers see per-feature LLM cost; per-PR cost-impact estimate; cost-aware reviews.

The discipline mirrors traditional cost engineering.

Antipatterns

Default to expensive model for everything. 10x bill premium.
No prompt caching. Pay full price for every cached prompt.
No per-feature cost view. Cannot optimize.

What to do this week

Three moves. (1) Apply this lever to your highest-spend workload. (2) Measure the dollar impact for one month. (3) Roll the practice out to the next two services if the savings hold.