Skip to content
System design course
Ch.2 · The building blocks·how to build it ·8 min read

Picking the right building block — a decision guide

Turn this chapter's parts into reflexes — a symptom-to-solution map so the right component comes to mind the moment a requirement appears.


How to use this

You now have the parts. The skill is reaching for the right one on cue — when the interviewer states a requirement, the matching block should pop into your head with its trade-off attached. This lesson is that lookup table, plus a worked reflex drill. Treat it as the recap before you start designing real systems in Chapter 4.

Symptom → reach for

When you hear / observe…Reach forAnd name the cost
”Too many requests for one server”Load balancer + horizontal scalingLB redundancy; sticky sessions hurt
”Reads dominate; DB is hot”Cache (cache-aside) + read replicasInvalidation; stale reads; stampedes
”Data won’t fit / writes too high for one node”Sharding by a high-cardinality keyCross-shard queries; hotspots; rebalancing
”Queries are slow on a big table”Index the filtered/sorted columnsSlower writes; storage
”Can’t lose this data / node might die”Replication (+ redundancy)Sync latency vs async data-loss window
”Stay consistent while a node is down”Quorum (R + W > N)Latency rises with R/W
”Who coordinates writes?”Leader/follower + electionSplit-brain; failover loss
”Cluster grows and shrinks a lot”Consistent hashing (virtual nodes)More moving parts than modulo
”Push updates to clients live”WebSockets / SSE / long-pollStateful connections to scale
”Is X in this huge set?” (cheap reject)Bloom filter in frontFalse positives → confirm on yes
”Detect a dead node”Heartbeats + timeoutCan’t tell slow from dead
”Catch silent corruption”Checksums / Merkle treesSmall compute overhead
”Strong consistency or uptime under partition?”Decide CP vs AP (per data type)The other is sacrificed during partitions

A reflex drill

Read each prompt, name your block and its cost before reading on.

  1. “A redirect service gets 100k reads/sec, mostly the same hot links.” → Cache the hot mappings; the DB only sees misses. Cost: invalidation when a link changes, and a stampede risk if a hot key expires.
  2. “Users in Europe see high latency reading their profile.” → A read replica in-region (locality). Cost: that replica may serve slightly stale data (eventual consistency) — fine for a profile.
  3. “The orders table is billions of rows and growing.” → Shard by a key that spreads load (e.g. customer id); index the common query columns. Cost: cross-shard reporting must scatter-gather; keep an order’s data on one shard.
  4. “We can’t lose a committed payment, ever.” → Synchronous replication with a quorum, CP under partition. Cost: writes pay coordination latency and can be refused during a partition — acceptable for money.
  5. “Show a typing indicator in chat.” → WebSockets (bidirectional, high-frequency). Cost: a pub/sub backplane to fan out across connection servers.

The meta-rule

For every block you place, say the requirement that justifies it and the new problem it introduces. “I’ll add a cache (reads dominate), which creates an invalidation problem I’ll handle with TTLs plus event-based eviction.” That two-part sentence — justification and cost — is the difference between naming technologies and designing systems. Carry it into Chapter 4.