Postmortems Intermediate By Samson Tanimawo, PhD Published Dec 13, 2026 11 min read

The Cloudflare 2024 BGP Outage: An SRE Postmortem Walkthrough

A walkthrough framed for SREs whose own services depend on edge providers. The lessons cost Cloudflare hours; we get them for the price of reading.

What happened, in plain terms

A configuration change to internal BGP propagated incorrectly, advertising routes that should have stayed inside Cloudflare's network as if they were public. Routers around the world started preferring those routes; traffic that should have reached customer origins was sucked into Cloudflare's blackholes.

The incident was global, not regional. That is the property of BGP problems, they propagate at the speed of routing announcements, which is fast and not bounded by geography.

How a route leak cascades

Why detection took longer than it should

Internal alerts caught it; external monitoring (the customer view) caught it slightly later because most external monitors live in or near the affected paths. The lesson: at least one monitoring vantage point must be outside the provider’s network.

The post-incident review noted that the initial rollback was complicated by the fact that the routing change had also been replicated to backup paths. Backups inherited the bug.

Four mitigations to put in place this quarter

Antipatterns this incident exposed

What to do this week

Three moves. (1) Verify your monitoring includes at least one vantage point outside your CDN. (2) Document the CDN-bypass runbook for at least one critical service. (3) Add a quarterly tabletop exercise, ‘CDN provider is unreachable for 90 minutes’, to your incident-response calendar.