What is Replication

Replication is keeping copies of the same data on multiple nodes. If one node fails, another has the data. If one node is overloaded, reads can be spread across replicas. Replication provides fault tolerance and read scalability — the two fundamental reasons distributed systems exist.

How it works

There are three main replication strategies:

Leader-follower (also called primary-replica). One node is the leader — all writes go to it. The leader streams changes to follower nodes. Followers serve read queries. If the leader fails, a follower is promoted. This is how PostgreSQL, MySQL, and MongoDB replicate by default. The trade-off: writes are limited to a single node's throughput.

Multi-leader. Multiple nodes accept writes. Each leader replicates its changes to the others. This allows writes in multiple data centers but introduces write conflicts — two leaders may update the same row simultaneously. Conflict resolution is complex (last-writer-wins, merge functions, CRDTs).

Leaderless. Clients write to multiple nodes in parallel and read from multiple nodes. A write succeeds when a quorum (majority) of nodes acknowledge it. A read queries a quorum and takes the most recent value. Cassandra and DynamoDB use this model. There is no single point of failure, but consistency guarantees depend on quorum configuration.

All strategies face the same fundamental tension: keeping replicas in sync takes time. Synchronous replication (wait for all replicas to confirm) is safe but slow. Asynchronous replication (confirm immediately, replicate later) is fast but risks data loss if the leader fails before replication completes.

Why it matters

Every production database that matters uses replication. Without it, a single disk failure or network outage means downtime and potential data loss. Replication is not optional — it is the baseline requirement for any system that needs to stay available.

See How Replication Works for the full comparison of replication strategies and their failure modes.