GLOSSARY · F

Failover

The automated switch from a failed primary to a standby replica, a cornerstone of high-availability architecture.

Definition

Failover is the process of switching from a failed primary system to a standby (secondary, replica, hot-spare) so the service stays up. Database failover, region failover, load-balancer failover, leader-election failover, every layer that aspires to high availability has a failover strategy. The two parameters that matter are RTO (Recovery Time Objective, how long the failover takes) and RPO (Recovery Point Objective, how much recent data you might lose). A 5-minute RTO and a 0-second RPO is dramatically more expensive to engineer than 1-hour RTO and 5-minute RPO.

Why it matters

Failover is the difference between an outage that lasts the meeting and one that lasts the day. But failover that hasn't been tested in months will break when needed; the configuration drifts, the standby falls behind, the DNS TTL is too long. Game days that exercise the failover path quarterly are the only way to keep the muscle alive.

How Nova handles it

See the part of the platform that handles failover in production.

Nova reliability snapshot

Failover

Definition

Why it matters

How Nova handles it

Related terms