How Replication Works — Copies, Leaders, and Followers

How Replication Works — Copies, Leaders, and Followers

2026-03-23

A single database server is a single point of failure. If it crashes, your application is down. If the disk fails, your data is lost. Replication solves this by keeping copies of the data on multiple machines.

But replication introduces the hardest problems in distributed systems: when data changes on one machine, how do you keep the copies in sync? What happens when a machine fails? What if the network between machines fails? Every answer involves a tradeoff between consistency, availability, and performance.

Why Replicate?

Fault tolerance — if one machine fails, others have the data. The system continues operating. For critical data, three replicas are standard — any single machine can fail without data loss.

Read performance — read queries can be served by any replica. Five replicas can handle 5x the read load of a single machine. Especially valuable when replicas are in different geographic regions, serving reads from the nearest location.

Durability — data on one disk can be lost to hardware failure. Data on three disks across three machines (ideally in different data centers) survives almost any failure scenario.

Leader-Follower Replication

The most common replication model. One node is the leader (primary). All writes go to the leader. The leader sends the changes to followers (replicas), which apply them.

Writes: client → leader only. The leader writes to its WAL and data files, then streams the WAL records to followers.

Reads: client → leader or any follower. Reading from a follower may return slightly stale data (if the follower hasn't applied the latest changes yet).

This is how PostgreSQL streaming replication, MySQL replication, and MongoDB replica sets work.

Client writes Leader WAL + data WAL stream Follower 1 replay WAL Follower 2 replay WAL Client reads (any replica) Writes go to leader only. Reads from any node. Follower reads may be slightly stale (replication lag).

Synchronous vs Asynchronous

Synchronous replication — the leader waits for at least one follower to confirm the write before acknowledging the client. Guarantees the data exists on multiple machines. But slower: every write waits for a network round trip to a follower.

Asynchronous replication — the leader acknowledges the client immediately after writing locally. The follower receives the change later. Faster, but if the leader crashes before the follower catches up, the latest writes are lost.

Most production systems use semi-synchronous: one follower is synchronous (guarantees at least one copy), the rest are asynchronous (performance).

Failover

When the leader fails:

  1. Detect the failure (heartbeat timeout, typically 10-30 seconds).
  2. Elect a new leader from the followers (the most up-to-date follower).
  3. Redirect clients to the new leader.

This is called failover. It's the most dangerous operation in replication — things that can go wrong: the old leader comes back (split brain — two leaders), the new leader is behind (missing recent writes), clients still pointing to the old leader.

Multi-Leader Replication

Multiple nodes accept writes. Each leader replicates to the others. Used when you need writes in multiple data centers (writing to a single remote leader adds cross-datacenter latency to every write).

The fundamental problem: write conflicts. Two leaders can modify the same row simultaneously. When their changes replicate to each other, which version wins?

Conflict resolution strategies:

  • Last writer wins (LWW) — the write with the latest timestamp wins. Simple but can lose data (clocks aren't perfectly synchronized).
  • Custom resolution — the application defines merge logic. CRDTs (Conflict-free Replicated Data Types) are data structures designed to merge automatically.
  • Keep both versions — present the conflict to the user or application to resolve.

Multi-leader is complex and error-prone. Most applications avoid it unless cross-datacenter writes are truly necessary.

Leaderless Replication

No designated leader. Any node accepts writes. The client sends writes to multiple nodes simultaneously. Reads also go to multiple nodes, and the client picks the most recent version.

Quorum reads/writes: with N replicas, require W nodes to acknowledge a write and R nodes to respond to a read, where W + R > N. This guarantees at least one node in every read has the latest write.

Example: 3 replicas, W=2, R=2. A write succeeds when 2 of 3 nodes confirm. A read queries 2 of 3 nodes. At least one of those 2 read nodes also confirmed the write.

Amazon's Dynamo, Apache Cassandra, and Riak use leaderless replication. It provides high availability (no single leader to fail) at the cost of complexity (conflict resolution, read repair, anti-entropy).

What Can Go Wrong?

Replication lag — followers fall behind the leader. A user writes a comment (goes to the leader) and immediately refreshes (reads from a follower that hasn't received the comment yet). The comment appears to vanish.

Split brain — during failover, both the old and new leader accept writes. Now there are conflicting versions of the data. Fencing (shutting down the old leader forcefully) prevents this.

Divergent state — in multi-leader or leaderless systems, replicas can have different data indefinitely. Anti-entropy processes and read repair gradually converge the replicas, but the window of inconsistency can be large.

These failure modes motivate the next topic: consistency models that define what guarantees you get despite replication.

Next Steps