Skip to content
System design course
Ch.2 · The building blocks·concept ·9 min read

Caching strategies

Keep hot data close to readers to cut latency and offload the database — plus the eviction and invalidation problems caching hands you back.


Why caching is the highest-leverage move

Most consumer systems are read-heavy, and most reads ask for the same small “hot” set of data. A cache keeps that hot data in fast storage (memory) close to where it’s read, so the common request never touches the slow, contended database. It’s usually the single biggest latency-and-cost win in a design — and the first thing to reach for when an estimate shows heavy reads.

Where caches live

Caching happens at every layer, and you’ll often use several:

  • Client / browser — avoid the request entirely.
  • CDN — cache static assets near users geographically (its own lesson in Chapter 3).
  • Application / distributed cache — a shared in-memory store like Redis or Memcached, sitting between app servers and the database. This is the one interviews usually mean.
  • Database cache — the DB’s own buffer pool.

Read patterns

  • Cache-aside (lazy loading) — the app checks the cache; on a miss it reads the DB, stores the result, and returns it. Simple and common; only requested data is cached. Downside: the first request for each key is always a miss, and stale data can linger until eviction.
  • Read-through — the app always asks the cache, and the cache fetches from the DB on a miss. Cleaner app code; the cache library owns the loading logic.

Write patterns

How you keep the cache and database from disagreeing:

  • Write-through — write to cache and DB together (synchronously). Reads after a write are fresh; writes pay extra latency.
  • Write-back (write-behind) — write to cache, flush to DB asynchronously. Fast writes and great for write-heavy bursts, but a cache crash before flush loses data.
  • Write-around — write straight to the DB and skip the cache; data is cached only when later read. Avoids filling the cache with write-once data.

(Read-through vs write-through is examined as a trade-off in Chapter 3.)

Eviction: the cache is finite

When the cache fills, something must go. Common policies:

  • LRU (least recently used) — evict what hasn’t been touched longest. The sensible default.
  • LFU (least frequently used) — evict the least-accessed; keeps genuinely popular items.
  • FIFO / TTL — evict oldest, or expire entries after a fixed time.

TTLs are your friend: even without explicit invalidation, a short expiry caps how stale data can get.

Invalidation: the genuinely hard part

“There are only two hard things in computer science: cache invalidation and naming things.”

When the underlying data changes, stale cache entries must be updated or removed. Strategies: write-through (update on write), TTL (let it expire), explicit invalidation (delete the key on update, often via an event). The trade-off is staleness vs cost: aggressive invalidation keeps data fresh but adds work and complexity; lazy TTLs are cheap but serve stale reads.

The failure modes to name

  • Thundering herd / cache stampede — a hot key expires and thousands of requests hit the DB at once. Mitigate with request coalescing, staggered TTLs, or refreshing slightly before expiry.
  • Cold cache — after a restart the cache is empty and the DB takes the full load; warm it or ramp traffic.

Bringing up invalidation and stampedes unprompted shows you’ve actually run a cache, not just drawn one.