Consistent hashing
Distribute keys across nodes so that adding or removing a node moves a tiny fraction of them — the trick behind elastic caches and sharded stores.
The problem with hash(key) % N
The obvious way to spread keys across N nodes is node = hash(key) % N. It
balances load — until N changes. Add or remove one node and the modulus shifts,
so almost every key maps somewhere new. For a cache that means a near-total
miss storm; for a sharded store it means moving nearly all the data. That’s
catastrophic for an elastic system that adds and removes nodes routinely.
The idea: a hash ring
Consistent hashing fixes this by hashing both keys and nodes onto the same circular space (say 0 to 2³²−1, wrapping around):
- Hash each node to a point on the ring.
- Hash each key to a point on the ring.
- A key belongs to the first node clockwise from its position.
nodeA
/ \
key3 key1
| ring |
nodeC nodeB
\ /
key2 (→ belongs to nodeC, next clockwise)
Why it scales gracefully
- Add a node: it lands somewhere on the ring and takes over only the keys between it and the previous node. Just K/N keys move — everything else stays put.
- Remove a node: its keys go to the next node clockwise. Again only that node’s share moves.
Contrast with modulo, where every key can move. This bounded movement is the whole point: clusters can grow, shrink, and recover from node death with minimal disruption.
Virtual nodes (the essential refinement)
Placing each physical node at one ring point creates two problems: uneven load (gaps between nodes vary) and a lopsided handoff when a node dies (one neighbor inherits all its keys). The fix: give each physical node many virtual nodes (hundreds of points around the ring). Now load evens out statistically, and a dead node’s keys scatter across many neighbors instead of dumping on one. Virtual nodes also let you weight bigger machines (give them more virtual nodes).
Where it shows up
- Distributed caches (Memcached clients) — so resizing the pool doesn’t invalidate everything.
- Sharded/NoSQL stores (DynamoDB, Cassandra) — to place partitions and replicas on the ring.
- Load balancers doing key-affinity routing.
The interview cue
Whenever a design has a changing set of nodes holding partitioned data or cache — anything elastic, or anything that must survive node failure without a full reshuffle — say “consistent hashing with virtual nodes.” It signals you know how data actually gets placed in scalable systems, not just that it “gets sharded.”