Circuit Breakers
Fail fast on downstream.
Overview
Circuit breakers fail fast when a downstream service is unhealthy. Retry alone amplifies the failure; circuit breakers stop the amplification and protect the upstream service from being dragged down with the dependency.
- Fail fast on downstream. Per-downstream circuit; the failed call returns immediately rather than retrying into a queue of failures.
- Per-call failure rate. Threshold of recent failures opens the circuit; the math is failure-percentage and rolling window, tuned per dependency.
- Half-open state. Periodic recovery test; one probe call decides whether to close the circuit and resume traffic.
- Per-circuit metrics plus cascade prevention. State and rate exported per circuit; one bad downstream cannot saturate the upstream pool.
The approach
The practical approach: per-downstream circuit instance, half-open recovery probes, exported state metrics, per-call timeouts aligned with the circuit, documented policy. The team’s discipline produces resilient clients.
- Per-downstream circuit. One circuit per logical dependency; shared circuits hide failure modes and confuse recovery.
- Half-open recovery. Periodic probe; cleanly distinguishes "still bad" from "recovered" without a full-traffic surge.
- Monitor state. Open/closed/half-open exported as a metric; alerts on prolonged open state for fast investigation.
- Aligned timeouts plus documented policy. Per-call timeout shorter than the circuit’s; circuit policy committed to the repo for operational reviews.
Why this compounds
Circuit breaker discipline compounds across services. Each protected call removes a cascade vector; the team’s resilience patterns spread; new clients ship correctly on the first try.
- Better resilience. Circuit breakers prevent cascading failure; the upstream survives the downstream outage.
- Better incident response. Failed downstreams isolated; the on-call sees one failed dependency, not a fleet of saturated services.
- Better operational fit. The circuit becomes part of the standard client; new dependencies inherit the protection automatically.
- Institutional knowledge. Each circuit teaches client patterns; the team’s reliability muscle grows across releases.
Circuit breaker discipline is a reliability discipline that pays off across years. Nova AI Ops integrates with circuit telemetry, surfaces patterns, and supports the team’s reliability discipline.