Picking the right building block — a decision guide
Turn this chapter's parts into reflexes — a symptom-to-solution map so the right component comes to mind the moment a requirement appears.
How to use this
You now have the parts. The skill is reaching for the right one on cue — when the interviewer states a requirement, the matching block should pop into your head with its trade-off attached. This lesson is that lookup table, plus a worked reflex drill. Treat it as the recap before you start designing real systems in Chapter 4.
Symptom → reach for
| When you hear / observe… | Reach for | And name the cost |
|---|---|---|
| ”Too many requests for one server” | Load balancer + horizontal scaling | LB redundancy; sticky sessions hurt |
| ”Reads dominate; DB is hot” | Cache (cache-aside) + read replicas | Invalidation; stale reads; stampedes |
| ”Data won’t fit / writes too high for one node” | Sharding by a high-cardinality key | Cross-shard queries; hotspots; rebalancing |
| ”Queries are slow on a big table” | Index the filtered/sorted columns | Slower writes; storage |
| ”Can’t lose this data / node might die” | Replication (+ redundancy) | Sync latency vs async data-loss window |
| ”Stay consistent while a node is down” | Quorum (R + W > N) | Latency rises with R/W |
| ”Who coordinates writes?” | Leader/follower + election | Split-brain; failover loss |
| ”Cluster grows and shrinks a lot” | Consistent hashing (virtual nodes) | More moving parts than modulo |
| ”Push updates to clients live” | WebSockets / SSE / long-poll | Stateful connections to scale |
| ”Is X in this huge set?” (cheap reject) | Bloom filter in front | False positives → confirm on yes |
| ”Detect a dead node” | Heartbeats + timeout | Can’t tell slow from dead |
| ”Catch silent corruption” | Checksums / Merkle trees | Small compute overhead |
| ”Strong consistency or uptime under partition?” | Decide CP vs AP (per data type) | The other is sacrificed during partitions |
A reflex drill
Read each prompt, name your block and its cost before reading on.
- “A redirect service gets 100k reads/sec, mostly the same hot links.” → Cache the hot mappings; the DB only sees misses. Cost: invalidation when a link changes, and a stampede risk if a hot key expires.
- “Users in Europe see high latency reading their profile.” → A read replica in-region (locality). Cost: that replica may serve slightly stale data (eventual consistency) — fine for a profile.
- “The orders table is billions of rows and growing.” → Shard by a key that spreads load (e.g. customer id); index the common query columns. Cost: cross-shard reporting must scatter-gather; keep an order’s data on one shard.
- “We can’t lose a committed payment, ever.” → Synchronous replication with a quorum, CP under partition. Cost: writes pay coordination latency and can be refused during a partition — acceptable for money.
- “Show a typing indicator in chat.” → WebSockets (bidirectional, high-frequency). Cost: a pub/sub backplane to fan out across connection servers.
The meta-rule
For every block you place, say the requirement that justifies it and the new problem it introduces. “I’ll add a cache (reads dominate), which creates an invalidation problem I’ll handle with TTLs plus event-based eviction.” That two-part sentence — justification and cost — is the difference between naming technologies and designing systems. Carry it into Chapter 4.