Distributed Systems FAQ

Common questions about the CAP theorem, replication, consistency, consensus, partitioning, and sagas. Each answer is short. Links go to the full explanation.

What is the CAP theorem?

The CAP theorem states that during a network partition, a distributed system must choose between consistency and availability. You cannot have both.

A CP system (etcd, ZooKeeper, HBase) rejects requests it cannot guarantee are consistent. If a node is cut off from the leader, it returns an error. A AP system (Cassandra, DynamoDB in default mode) continues serving requests during a partition but may return stale data.

When there is no partition (the normal case), a well-designed system provides all three properties. The trade-off only matters during actual network failures.

See How Consistency Works for real-world examples of CP and AP system behavior.

What is the difference between leader-follower and leaderless replication?

Leader-follower replication sends all writes to one node (the leader). The leader streams changes to follower replicas. Followers serve reads. If the leader fails, a follower is promoted. PostgreSQL, MySQL, and MongoDB use this model. It is simple and provides clear consistency guarantees, but the leader is a write throughput bottleneck.

Leaderless replication sends writes to multiple nodes in parallel. A write succeeds when a quorum (majority) acknowledges. Reads also query a quorum and take the most recent value. Cassandra and DynamoDB use this model. There is no single point of failure, but conflict resolution is more complex and consistency guarantees are weaker.

See How Replication Works for failover behavior, conflict resolution, and quorum math.

What is eventual consistency?

Eventual consistency means that if no new writes occur, all replicas will eventually converge to the same value. "Eventually" is unbounded — it could be milliseconds under normal conditions or minutes during heavy load or network issues.

In the meantime, different replicas may return different values for the same key. A user writes data, the write lands on one replica, and a subsequent read hits a different replica that hasn't received the update yet. The read returns stale data.

Eventual consistency is the default for DynamoDB and Cassandra. It is acceptable for use cases like social media feeds and analytics, but dangerous for use cases like account balances and inventory counts.

See How Consistency Works for the full spectrum from eventual to strong consistency.

What is Raft?

Raft is a consensus algorithm designed to be understandable. It decomposes consensus into three sub-problems:

Leader election — one node is the leader. If the leader fails, followers hold an election with randomized timeouts to prevent split votes.
Log replication — the leader appends client requests to its log and replicates entries to followers. An entry is committed when a majority acknowledges it.
Safety — a candidate cannot win an election unless its log is at least as up-to-date as a majority, ensuring committed entries are never lost.

etcd (Kubernetes), Consul (service discovery), CockroachDB, and TiKV all use Raft.

See How Consensus Works for term numbers, log matching, and failure scenarios.

What is the difference between partitioning and replication?

Replication copies the same data to multiple nodes. The goal is fault tolerance and read scalability — if one node fails, another has the data.

Partitioning distributes different data to different nodes. The goal is storage and write scalability — no single node needs to hold the entire dataset.

Most production systems use both. Data is partitioned across nodes (so the system can handle more data than one machine can store), and each partition is replicated (so no partition is a single point of failure). CockroachDB, Cassandra, and DynamoDB all partition and replicate.

See How Partitioning Works for partitioning strategies and their interaction with replication.

What is a saga?

A saga is a sequence of local transactions across multiple services, where each step has a compensating action that undoes it if a later step fails. Example: an order saga charges Payment, reserves Inventory, and schedules Shipping. If Shipping fails, the saga runs compensating actions: release the reservation, refund the charge.

Sagas provide eventual consistency, not atomicity. Between steps, the system is in an intermediate state. Designing correct compensating actions — and handling the cases where compensation itself fails — is the hard part.

See How Distributed Transactions Work for choreography vs orchestration and real-world saga patterns.

What is two-phase commit?

Two-phase commit (2PC) is a protocol for atomic transactions across multiple nodes. It works in two phases:

Prepare — the coordinator asks all participant nodes to prepare the transaction. Each participant writes the transaction to its local log and votes yes (ready to commit) or no (abort).
Commit/Abort — if all participants vote yes, the coordinator sends a commit decision. If any votes no, it sends abort. Participants execute the decision.

The problem: 2PC is blocking. If the coordinator crashes after sending prepare but before sending the commit decision, participants are stuck holding locks indefinitely until the coordinator recovers. This makes 2PC impractical for long-running or cross-datacenter transactions. Sagas are the common alternative.

See How Distributed Transactions Work for the full comparison of 2PC, 3PC, and sagas.

What is idempotency?

An operation is idempotent if executing it multiple times produces the same result as executing it once. SET balance = 100 is idempotent — running it twice still leaves the balance at 100. ADD 50 TO balance is not — running it twice adds 100.

Idempotency is critical in distributed systems because network failures make retries unavoidable. A client sends a request, the server processes it, but the response is lost. The client retries. Without idempotency, the operation executes twice.

Common techniques: use an idempotency key (a unique request ID) that the server checks before processing. If the key has been seen before, return the cached response. This is how Stripe, AWS, and most payment APIs handle retries safely.

See How Distributed Transactions Work for idempotency keys, deduplication, and exactly-once semantics.