Designing a distributed cache

Build a Redis/Memcached-style in-memory cache that spans many nodes — sharding by consistent hashing, replication for availability, and eviction under memory pressure.

The problem

Design a distributed in-memory cache (think Redis Cluster / Memcached): a key-value store living in RAM across many nodes, fronting a database to serve hot reads in sub-millisecond time. One node’s RAM is too small and a single node is a SPOF, so the cache must scale out and stay available.

Step 1 — Requirements

Functional: get(key), put(key, value, ttl), delete(key); data spread across nodes; survive node failures; evict under memory pressure.

Non-functional: very low latency (sub-ms), high throughput, horizontal scalability, high availability, and a tunable consistency stance (caches usually accept eventual/best-effort).

Step 2 — Sharding: place keys on nodes

Spread keys across nodes so each holds a slice of the data and the load:

Use consistent hashing with virtual nodes (Chapter 2) so adding/removing a cache node remaps only ~1/N of keys — critical, because naive hash % N would invalidate the entire cache whenever the pool resizes (a miss storm onto the DB).
The client (or a proxy) hashes the key to find its node.

Step 3 — Replication for availability

A pure-sharded cache loses a slice when a node dies (cold for those keys → DB load). Add replication:

Each shard has a primary + one or more replicas (leader/follower).
On primary failure, a replica is promoted (failover) so the keys stay warm.
Replication is usually asynchronous (caches favor speed/availability over strict consistency).

Step 4 — Topologies

Client-side sharding — a smart client library knows the ring and routes directly (Memcached style). Fast, no proxy hop; clients must share ring config.
Proxy-based — a proxy (e.g. twemproxy) fronts the cluster and routes. Centralizes routing; adds a hop.
Cluster-aware — nodes gossip membership and redirect clients (Redis Cluster). Self-managing.

Step 5 — Eviction and expiry

RAM is finite, so the cache must shed data:

TTL expiry — entries auto-expire (lazy on access + active sampling).
Eviction policy when full — LRU (default), LFU (keeps truly popular keys), random, or TTL-based. (Chapter 2’s eviction lesson.)

High-level design

app → cache client ─(consistent hash)→ shard primary  ⇄  replica (failover)
                                          │ miss handled by app (cache-aside)
                                          ▼
                                       database

Trade-offs to raise

Consistency vs latency — async replication and cache-aside mean brief staleness; fine for a cache, but name it.
Memory vs hit ratio — more nodes/RAM = higher hit ratio = less DB load; cost scales with it.
Invalidation — the hard part (Chapter 2): TTLs + event-based deletes on write; accept a small stale window.
Hot key — one key hammering one shard; mitigate by replicating that key to multiple nodes or a small client-side local cache.

The interview cue

“Consistent hashing with virtual nodes to shard so resizing doesn’t nuke the cache, primary+replica per shard with async replication and failover for availability, LRU + TTL eviction, and cache-aside with event-based invalidation. Hot keys get replicated or locally cached.” Sharding + replication + eviction is the skeleton; the implementation and the hot-key/stampede handling come next.