Back-of-the-envelope estimation
A few rough numbers — QPS, storage, bandwidth — that justify sharding, caching, and replication instead of guessing at them.
Why bother with the math
Estimation isn’t about precision; it’s about earning your design decisions. “We’ll need to shard” is hand-waving. “We’re at ~40k writes/sec and ~3 TB/year, so a single node won’t hold it — we shard by user id” is engineering. The number turns an opinion into a justified choice.
The numbers worth memorizing
Powers of ten (data): 1 thousand = 10³, 1 million = 10⁶, 1 billion = 10⁹, 1 trillion = 10¹². A “KB” is 10³ bytes, MB 10⁶, GB 10⁹, TB 10¹², PB 10¹⁵.
Latency ballparks (order of magnitude):
- Memory reference: ~100 ns
- Read 1 MB sequentially from memory: ~10 µs
- SSD random read: ~100 µs
- Round trip within a datacenter: ~500 µs
- Read 1 MB from SSD: ~1 ms
- Disk seek (spinning): ~10 ms
- Round trip across the world: ~150 ms
The takeaway you’ll actually use: memory ≫ SSD ≫ disk ≫ network across regions. That ordering is why caches and same-region reads matter.
Time: there are ~86,400 seconds/day, round to 10⁵. So “1 million events/day” ≈ ~12/sec average; “1 billion/day” ≈ ~12k/sec average. Peak is usually 2–10× the average — say so.
A repeatable recipe
- Start from users. Daily active users (DAU) × actions per user per day.
- Convert to QPS. Divide by ~10⁵ (seconds/day) for the average; multiply by a peak factor (≈2–10×) for the peak.
- Split reads vs writes. Most consumer systems are read-heavy (often 100:1 or more) — this decides how hard you lean on caching and replicas.
- Storage = objects/day × size × retention. Add replication factor (×3 is common) and headroom.
- Bandwidth = QPS × payload size.
A worked example: a URL shortener
- Assume 100M new URLs/day. Average write QPS = 100M / 10⁵ = ~1,000 writes/sec; peak ~ a few thousand.
- Read:write ≈ 100:1 → ~100k reads/sec average. That is why redirects must be served from cache, not a database.
- Storage: each record ~500 bytes → 100M × 500 B = 50 GB/day → ~18 TB/year. With ×3 replication, ~55 TB/year. A single box won’t hold years of this → partition.
Notice every number led somewhere: reads → cache, storage → sharding. That’s the point.
How to present it
Round aggressively and say your assumptions out loud (“let’s assume 100M writes a day”). Interviewers care that the method is sound, not that you nailed the constant. If they push back on an assumption, adjust and move on — the recalculation is itself a good signal.