What is Consensus

Consensus is the problem of getting multiple nodes in a distributed system to agree on a single value, even when some nodes fail or network messages are lost. It sounds simple, but it is one of the hardest problems in computer science. Without consensus, distributed systems cannot elect leaders, commit transactions across nodes, or maintain a consistent replicated log.

How it works

A consensus algorithm must satisfy three properties:

  • Agreement — all non-faulty nodes decide the same value.
  • Validity — the decided value was proposed by some node (the algorithm doesn't invent values).
  • Termination — all non-faulty nodes eventually decide (the algorithm doesn't run forever).

The two most important consensus algorithms are Paxos and Raft. Both work by electing a leader that proposes values and getting a majority (quorum) of nodes to accept. A value is committed once a majority acknowledges it.

Consensus requires a majority quorum — more than half the nodes must be reachable. A 3-node cluster tolerates 1 failure. A 5-node cluster tolerates 2 failures. If a majority is unreachable, the system stops accepting writes (it chooses consistency over availability — a CP system per the CAP theorem).

The FLP impossibility result (1985) proved that no deterministic consensus algorithm can guarantee termination in an asynchronous system where even one node can crash. Practical algorithms like Raft and Paxos work around this by using timeouts and randomized election timers — they are technically not guaranteed to terminate, but they do in practice.

Why it matters

Consensus is the foundation of every strongly consistent distributed system. etcd (Kubernetes), ZooKeeper (Kafka), and CockroachDB all use consensus internally. When you read that a system provides "strong consistency" or "linearizability," consensus is the mechanism making it possible.

See How Consensus Works for the full walkthrough of leader election and log replication.