Skip to content
System design course
Ch.2 · The building blocks·concept ·7 min read

Quorum

Agree by majority instead of unanimity — the rule that lets a replicated system stay consistent and available while some nodes are down.


The idea

When data is replicated across N nodes, requiring all of them to confirm a write makes you fragile — one slow or dead node blocks everything. Requiring one is fast but lets replicas disagree. A quorum is the middle path: require a majority (or a configured minimum) of nodes to agree, so the system tolerates some failures while still guaranteeing consistency.

The R + W > N rule

With N replicas, you choose how many must acknowledge:

  • W — nodes that must confirm a write before it’s considered successful.
  • R — nodes that must respond to a read.

The key inequality:

W + R > N guarantees every read overlaps at least one node that saw the latest write — so reads can’t miss it.

Example: N = 3. Pick W = 2, R = 22 + 2 > 3, so any read set and any write set share a node. You get strong consistency while tolerating one node down, because you never needed all three.

Tuning R and W

The same machinery lets you slide along the latency/consistency dial:

  • Write-heavy, want fast writes: small W (e.g. W = 1), larger R. Writes ack quickly; reads work harder to stay consistent.
  • Read-heavy, want fast reads: small R (e.g. R = 1), larger W. Reads are cheap; writes must reach more nodes.
  • Maximize availability (accept eventual consistency): pick W + R ≤ N (e.g. W = 1, R = 1) — fast and always-on, but a read can miss a recent write.

This is exactly the tunable consistency in Dynamo-style stores (Cassandra, DynamoDB).

Why a majority specifically

A majority quorum (more than N/2) guarantees any two quorums overlap, which is what prevents two conflicting decisions from both “winning.” It’s also why consensus systems and leader elections use majorities — and why clusters are usually sized odd (3, 5, 7): an odd count gives a clear majority and the best failure tolerance per node (a 5-node cluster survives 2 failures).

Where it shows up

  • Leaderless replication (Dynamo, Cassandra) — read/write quorums for tunable consistency.
  • Consensus / coordination (ZooKeeper, etcd, Raft) — a majority must agree to commit, electing leaders and preventing split-brain.
  • Leader election generally — a candidate needs majority votes to lead.

The interview cue

When you’ve replicated data and the interviewer probes “how do you stay consistent if a node is down?”, reach for quorums: “With N = 3 and R = W = 2, I tolerate one failure and still guarantee consistent reads; if I needed lower latency I’d relax R and accept eventual consistency.” That answer connects replication, consistency, and availability in one move.