FinOps Intermediate By Samson Tanimawo, PhD Published Nov 6, 2026 9 min read

LLM and GenAI Cost Engineering

GenAI costs are unique; the levers are well-known but rarely all applied. Done right, 70-90% reduction is realistic.

Why GenAI cost is different

Cost scales with token volume × model tier. A bad architecture can 10x the bill at the same user volume.

The optimization opportunities are large.

Four levers

Per-lever savings

Routing: 30-50% savings. Caching: 50-90% on cacheable prompts. Batch: 50% off realtime price. Fine-tuning: model-dependent.

Combined: 70-90% common.

Cost-aware GenAI culture

Engineers see per-feature LLM cost; per-PR cost-impact estimate; cost-aware reviews.

The discipline mirrors traditional cost engineering.

Antipatterns

What to do this week

Three moves. (1) Apply this lever to your highest-spend workload. (2) Measure the dollar impact for one month. (3) Roll the practice out to the next two services if the savings hold.