Rate Limiting and Throttling Strategies

Rate limiting protects you from abuse + accidental load. The patterns are well-known; the implementation choices matter.

Why rate limit

Rate limiting is the single cheapest reliability mechanism most APIs ship without. The cost of adding it is small; the cost of skipping it is the next outage.

Without it. One bad client, one runaway batch job, or one credential leak can saturate the entire service.
With it. Excess traffic gets a 429; the service stays healthy for everyone else; degradation is predictable.
Cost protection. Per-tenant limits cap downstream cost (databases, third-party APIs) when traffic spikes.
Abuse posture. Distinguishes legitimate burst from abuse; abuse hits the limit, legitimate burst usually has burst tolerance to spare.

Four dimensions

1. Per-IP, basic abuse protection.
2. Per-API-key, tenant fairness.
3. Per-route, expensive endpoints protected.
4. Global, protect downstream resources.

Algorithm choices

Three algorithms cover almost all rate-limiting needs. Pick by whether you want bursts, smoothing, or precision over memory.

Token bucket. Tokens accrue at a steady rate; requests consume tokens; bursts are allowed up to the bucket size.
Leaky bucket. Smooths output to a constant rate; rare in APIs, common in egress traffic shaping.
Sliding window. Counts requests in a rolling time window; most precise; more memory per key.
Default pick. Token bucket for APIs; the burst tolerance matches user expectation and the implementation is simple.

Response patterns

How you reject matters as much as whether you reject. The right response codes and headers turn rate limits into a contract clients can respect.

429 + Retry-After. The standard response; Retry-After tells clients how long to wait before retrying.
503 + backoff. Use 503 for global throttling and shedding; clients should already exponential-backoff on 5xx.
X-RateLimit headers. X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset; let clients self-pace.
Burst tolerance. Allow short bursts above the rate; legitimate spikes survive, sustained abuse does not.

Antipatterns

One global limit only. Tenant fairness lost.
Token bucket without burst tolerance. Legitimate spikes blocked.
No Retry-After header. Clients retry-storm.

What to do this week

Three moves. (1) Apply this pattern to your highest-risk network path. (2) Measure the failure mode rate before/after. (3) Document the change so the next incident-responder inherits the knowledge.