Designing a content delivery network
Build a global edge-caching network — points of presence, request routing, a cache hierarchy with origin shield, and how content stays fresh.
The problem
Design a CDN: a globally distributed network of caching servers that delivers content (images, video, JS/CSS, downloads) from a location near each user, cutting latency and offloading the origin. Chapter 3 covered when to use one; here we build it.
Step 1 — Requirements
Functional: cache and serve static/media content from edge locations near users; route each user to a nearby edge; fetch from origin on a miss; support purge/invalidation; serve over HTTPS.
Non-functional: low latency (proximity), massive scale and bandwidth (petabytes, huge QPS), high availability (survive edge/origin failures), and high cache hit ratio (the metric that defines a CDN’s value — offload).
Step 2 — Estimation that shapes it
The driving numbers: a high hit ratio (say 90–95%) means origin sees a fraction of traffic; total egress bandwidth is the dominant cost; object sizes range from KB (CSS) to GB (video) → different caching/segmentation strategies.
Step 3 — Topology: PoPs and a cache hierarchy
- Edge servers grouped into PoPs (points of presence) in many cities.
- A two-tier hierarchy: edge caches → regional/parent caches → origin shield → origin. A miss at the edge checks a parent before bothering origin, so origin sees very few requests (and isn’t stampeded).
user → nearest edge PoP ─miss→ regional cache ─miss→ origin shield → origin
(hit: serve immediately)
Step 4 — Request routing (getting users to the right edge)
How does a user reach the nearest PoP?
- DNS-based — the CDN’s authoritative DNS returns the IP of a nearby PoP based on the resolver’s location/health/load. Simple, widely used.
- Anycast — every PoP announces the same IP; BGP routes the user to the topologically nearest one. Naturally handles failover (a dead PoP withdraws the route).
Routing also factors load and health, not just distance.
Step 5 — Caching, freshness, and invalidation
- Pull (lazy) — edge fetches from origin on first request, caches per
Cache-Control/TTL. The default. Push — pre-load large known assets (video). - Freshness — honor
Cache-Control/ETag; use versioned URLs (app.a1b2.js) so a deploy changes the URL and needs no invalidation at all (the cleanest strategy). - Purge/invalidation — when content must change under the same URL, propagate a purge to all edges (a fan-out via a control plane); support soft purge (serve stale while revalidating).
Step 6 — Big files: segmentation
Video and large downloads are split into chunks/segments so edges cache and serve ranges independently, support adaptive bitrate streaming, and don’t hold huge objects whole. (Reused in the YouTube/Netflix designs.)
Trade-offs to raise
- Freshness vs hit ratio — long TTLs maximize offload but risk staleness; versioned URLs sidestep the conflict.
- Storage at edge is finite — evict with LRU; not everything fits, so cache the hot tail.
- Consistency of purge — invalidation isn’t instant across thousands of edges; acceptable for most content.
The interview cue
“PoPs worldwide with a two-tier cache hierarchy + origin shield so origin sees a trickle; anycast or geo-DNS routing to the nearest healthy PoP; pull caching with versioned URLs to avoid invalidation, plus a purge control plane when needed; segmented large media.” Hit ratio, routing, and the cache hierarchy are the heart of the answer; eviction and purge mechanics come next.