Cloudflare 2019 Routing Incident
BGP gone wrong.
Overview
The Cloudflare 2019 routing incident was a multi-hour outage triggered by a BGP route leak from a regional ISP. The lessons that emerged reshaped how teams think about BGP propagation, peer filtering, and cryptographic origin validation.
- The leak. A regional ISP advertised routes it should not have, and upstream peers accepted them without filtering.
- Propagation speed. Bad routes reached global routing tables in minutes. The internet’s default behaviour is fast propagation, not careful validation.
- AS-level filtering gap. Many operators trusted peers without enforcing route filters. The trust model relied on every peer being well-behaved.
- RPKI as the missing layer. Cryptographic origin validation existed but was not widely deployed. The incident accelerated adoption across the industry.
The approach
Defence against BGP misadventure is layered. RPKI handles the cryptographic origin proof; AS-level filtering catches what RPKI does not; monitoring catches what filtering misses.
- RPKI for prefixes. Cryptographic origin validation rejects routes that fail the signature check. The internet’s baseline trust model gets stronger by default.
- AS-level peer filtering. Each peer is allowed only the prefixes its AS legitimately announces. Misbehaving peers cannot leak past the filter.
- Route-change monitoring. Per-prefix announcement tracking flags hijacks within minutes. Tools like BGPmon and RIPE Atlas surface anomalies.
- Looking-glass visibility. Per-region BGP views during incidents shorten root-cause time. Without them, BGP debugging is guesswork.
Why this compounds
Each architecture review that applies the Cloudflare lessons hardens one more network. The compounding works because BGP defence is collective: each operator’s filtering benefits every operator’s reachability.
- Lower hijack risk. RPKI plus filtering reduces the probability of accepting a leaked or hijacked route.
- Faster incident response. Looking-glass tools and per-prefix monitoring shorten BGP root-cause time materially.
- Industry learning. Publicly shared incident reports lift the floor for every operator. The commons benefits.
- Year-one investment, year-two habit. The first BGP review is heavy lift. Subsequent reviews reuse the patterns and run faster.