Load Shedding Pattern
Drop to survive.
Overview
Load shedding drops requests under overload to preserve service for the rest. Capacity planning addresses the average; load shedding addresses the spike that exceeds capacity. The discipline is dropping cleanly with the right HTTP signal so clients can back off.
- Drop to survive. Per-overload request shedding; the alternative is full collapse for everyone.
- Per-priority shedding. Per-priority shed order; matches business value by preserving the most important requests.
- Backpressure signal. Per-load upstream signal; matches modern systems where downstream load shapes upstream rate.
- 503 with Retry-After plus per-tier protection. 503 with Retry-After is the proper HTTP response; per-tier shedding policy matches priority.
The approach
The practical approach: per-priority shedding, backpressure to upstream, 503 with Retry-After, per-tier shedding policy, documented per-system rules. The team’s discipline produces real overload resilience instead of full collapse.
- Per-priority shedding. Per-priority shed order; the most valuable requests survive.
- Backpressure signal. Per-load upstream signal; the load propagates correctly through the system.
- 503 with Retry-After. Per-shed proper response; clients back off according to the header.
- Per-tier protection plus documented policy. Per-tier shedding policy; per-system rules committed for operational reviews.
Why this compounds
Load shedding discipline compounds across services. Each correctly-designed shedding produces ongoing resilience; the team’s reliability expertise grows; new services inherit the shedding patterns.
- Better overload resilience. Right shedding produces survival; the service stays up under spike load instead of collapsing.
- Better user experience. Right requests survive; the most valuable users still get served.
- Better operational fit. Right policy matches workload; the shedding rules reflect the actual workload.
- Institutional knowledge. Each policy teaches reliability patterns; the team’s reliability muscle grows.
Load shedding discipline is an engineering discipline that pays off across years. Nova AI Ops integrates with overload telemetry, surfaces patterns, and supports the team’s reliability discipline.