Failover is the process of automatically switching to a backup system when a primary system fails. It is a core reliability mechanism used in distributed systems, databases, servers, and cloud infrastructure. Failover can involve redirecting traffic, promoting standby instances, or switching storage replicas. The goal is to minimize downtime and ensure continuous availability.
Why it matters
Systems that serve real users cannot afford long outages. Failover allows services to continue operating even during hardware failures, network issues, or software crashes. It is a foundational principle behind high-availability designs.
Examples
Database failover that promotes a read replica to primary, or load balancers routing traffic away from unhealthy nodes. Lessons like High Availability and Regions and Availability Zones cover failover strategies.