The automated switch from a failed primary to a standby replica, a cornerstone of high-availability architecture.
Failover is the process of switching from a failed primary system to a standby (secondary, replica, hot-spare) so the service stays up. Database failover, region failover, load-balancer failover, leader-election failover, every layer that aspires to high availability has a failover strategy. The two parameters that matter are RTO (Recovery Time Objective, how long the failover takes) and RPO (Recovery Point Objective, how much recent data you might lose). A 5-minute RTO and a 0-second RPO is dramatically more expensive to engineer than 1-hour RTO and 5-minute RPO.
Failover is the difference between an outage that lasts the meeting and one that lasts the day. But failover that hasn't been tested in months will break when needed; the configuration drifts, the standby falls behind, the DNS TTL is too long. Game days that exercise the failover path quarterly are the only way to keep the muscle alive.
See the part of the platform that handles failover in production.