Facebook BGP 2021

Total outage.

Overview

The October 2021 Facebook BGP outage was a multi-hour total outage across Facebook, Instagram, and WhatsApp. A misconfigured BGP withdrawal removed Facebook's authoritative DNS from the internet; recovery was slow because the internal tooling that engineers needed to fix the problem depended on the very systems that were down. The case study generalises: when your fix-it tools share dependencies with the broken systems, you have built a recovery dead-end.

The approach

Five disciplines turn the Facebook lesson into operational practice: break-glass procedures that do not depend on production, out-of-band management network access, configuration validation before deploy, dependency mapping that catches circular reachability, regular game-day exercises that prove the recovery actually works.

Why this compounds

Each architecture review that applies the Facebook lesson catches a circular dependency before it becomes the next outage. Out-of-band access shortens worst-case MTTR. Configuration validation prevents the class of mistake that caused this incident. By year two the team's resilience model is shaped by the lesson rather than learning it the hard way.