Rate Limiting and Throttling Strategies
Rate limiting protects you from abuse + accidental load. The patterns are well-known; the implementation choices matter.
Why rate limit
Without rate limiting: one client can take down the service.
With: predictable degradation under abuse.
Four dimensions
- 1. Per-IP, basic abuse protection.
- 2. Per-API-key, tenant fairness.
- 3. Per-route, expensive endpoints protected.
- 4. Global, protect downstream resources.
Algorithm choices
Token bucket: allows bursts within rate; common.
Leaky bucket: smooths output rate; rare in APIs.
Sliding window: precise; more memory.
Response patterns
429 + Retry-After header is correct response.
503 with backoff for global throttle.
Burst-tolerance prevents blocking legitimate spikes.
Antipatterns
- One global limit only. Tenant fairness lost.
- Token bucket without burst tolerance. Legitimate spikes blocked.
- No Retry-After header. Clients retry-storm.
What to do this week
Three moves. (1) Apply this pattern to your highest-risk network path. (2) Measure the failure mode rate before/after. (3) Document the change so the next incident-responder inherits the knowledge.