Read-heavy vs write-heavy systems

The read:write ratio is the first number you should estimate — it points to entirely different toolkits of caching and replication versus partitioning and queues.

Why the ratio drives the design

Early in any problem, estimate the read:write ratio. It’s the single number that most shapes your architecture, because the techniques that help reads (caching, replicas, denormalization) are mostly different from the ones that help writes (sharding, queues, write-optimized stores) — and some even trade against each other.

Read-heavy systems

Reads vastly outnumber writes (often 100:1 or 1000:1) — social feeds, news, e-commerce browsing, video. Optimize the read path:

Caching — the biggest lever: serve hot data from memory (Redis) and the edge (CDN) so most reads never touch the DB.
Read replicas — fan reads across many follower copies of the database; the primary handles only writes.
Denormalization / precomputation — store data in the shape it’s read, avoiding expensive joins at read time (e.g. precompute a timeline).
Indexes — add them freely; reads benefit and writes are rare enough to absorb the cost.

Trade-off: these mostly buy read speed with staleness (cache and replicas lag) and storage (duplicated, denormalized data). Usually a fine deal — feeds tolerate eventual consistency.

Write-heavy systems

Writes dominate or are very high-volume — logging, metrics/IoT telemetry, analytics ingestion, activity tracking. Optimize the write path:

Sharding/partitioning — spread writes across many nodes so no single primary is the bottleneck (the main move).
Message queues / buffering — absorb spikes and decouple ingestion from processing; the client write returns fast, work happens async (write-back idea).
Write-optimized stores — LSM-tree engines (Cassandra, RocksDB) batch writes into sequential appends instead of random in-place updates; great write throughput.
Batching — group many writes into one bulk operation to amortize overhead.
Fewer indexes — every index taxes each write, so keep them minimal and push search to a separate, asynchronously-updated index.

Trade-off: these buy write throughput with read complexity (data scattered across shards; eventual consistency; reads may need scatter-gather) and operational complexity (queues, async pipelines).

The side-by-side

	Read-heavy	Write-heavy
Main lever	Caching + read replicas	Sharding + queues
Data shape	Denormalized for reads	Append/write-optimized (LSM)
Indexes	Many (reads benefit)	Few (writes get taxed)
Consistency	Eventual is usually fine	Often async/eventual
Example	Twitter timeline, CDN	Metrics ingestion, logging

The fan-out connection

Many systems are both — heavy writes feeding even heavier reads (a tweet is written once, read by millions). That’s the classic fan-out-on-write vs fan-out-on-read decision: precompute on write to make reads cheap, or compute on read to make writes cheap. You often split by case (precompute for normal users, compute-on-read for celebrities) — a pattern you’ll apply repeatedly in Chapter 4.

The interview cue

State the ratio and let it steer you: “This is ~1000:1 read-heavy, so the design is dominated by caching and read replicas, and I’ll denormalize the timeline for cheap reads — accepting eventual consistency. The write path is comparatively light, so a single sharded primary suffices.” Estimating the ratio first and choosing the matching toolkit is the move.