LLM Gateway Design
An LLM gateway sits between your app and the providers. Routing, caching, fallback, observability, and cost control all live here. Building one is a weekend; not having one is a year of small fires.
What it does
A single endpoint your app calls. The gateway routes to the right provider/model, caches when possible, falls over when a provider is down, logs everything for billing and audit, enforces rate limits and budgets.
OSS options
- LiteLLM: Python library + proxy server. 100+ providers. The default starting point.
- Portkey: managed gateway with strong analytics, also self-hostable.
- OpenRouter: hosted gateway, transparent pricing across providers.
Must-have features
- Provider-agnostic API (OpenAI-compatible).
- Routing (rule-based, cost-based, fallback).
- Caching (exact and semantic).
- Observability (cost-per-request, latency, error rates by provider).
- Budget enforcement (per-user, per-app, per-time-window).
Build vs buy
For early stage: LiteLLM as a library, no gateway. For multi-team, multi-app: stand up a gateway service (LiteLLM proxy or Portkey). For enterprise: managed gateway with audit and compliance features.
Most teams need the gateway by their second app. Plan for it on day one.