Designing a content delivery network

Build a global edge-caching network — points of presence, request routing, a cache hierarchy with origin shield, and how content stays fresh.

The problem

Design a CDN: a globally distributed network of caching servers that delivers content (images, video, JS/CSS, downloads) from a location near each user, cutting latency and offloading the origin. Chapter 3 covered when to use one; here we build it.

Step 1 — Requirements

Functional: cache and serve static/media content from edge locations near users; route each user to a nearby edge; fetch from origin on a miss; support purge/invalidation; serve over HTTPS.

Non-functional: low latency (proximity), massive scale and bandwidth (petabytes, huge QPS), high availability (survive edge/origin failures), and high cache hit ratio (the metric that defines a CDN’s value — offload).

Step 2 — Estimation that shapes it

The driving numbers: a high hit ratio (say 90–95%) means origin sees a fraction of traffic; total egress bandwidth is the dominant cost; object sizes range from KB (CSS) to GB (video) → different caching/segmentation strategies.

Step 3 — Topology: PoPs and a cache hierarchy

Edge servers grouped into PoPs (points of presence) in many cities.
A two-tier hierarchy: edge caches → regional/parent caches → origin shield → origin. A miss at the edge checks a parent before bothering origin, so origin sees very few requests (and isn’t stampeded).

user → nearest edge PoP ─miss→ regional cache ─miss→ origin shield → origin
            (hit: serve immediately)

Step 4 — Request routing (getting users to the right edge)

How does a user reach the nearest PoP?

DNS-based — the CDN’s authoritative DNS returns the IP of a nearby PoP based on the resolver’s location/health/load. Simple, widely used.
Anycast — every PoP announces the same IP; BGP routes the user to the topologically nearest one. Naturally handles failover (a dead PoP withdraws the route).

Routing also factors load and health, not just distance.

Step 5 — Caching, freshness, and invalidation

Pull (lazy) — edge fetches from origin on first request, caches per Cache-Control/TTL. The default. Push — pre-load large known assets (video).
Freshness — honor Cache-Control/ETag; use versioned URLs (app.a1b2.js) so a deploy changes the URL and needs no invalidation at all (the cleanest strategy).
Purge/invalidation — when content must change under the same URL, propagate a purge to all edges (a fan-out via a control plane); support soft purge (serve stale while revalidating).

Step 6 — Big files: segmentation

Video and large downloads are split into chunks/segments so edges cache and serve ranges independently, support adaptive bitrate streaming, and don’t hold huge objects whole. (Reused in the YouTube/Netflix designs.)

Trade-offs to raise

Freshness vs hit ratio — long TTLs maximize offload but risk staleness; versioned URLs sidestep the conflict.
Storage at edge is finite — evict with LRU; not everything fits, so cache the hot tail.
Consistency of purge — invalidation isn’t instant across thousands of edges; acceptable for most content.

The interview cue

“PoPs worldwide with a two-tier cache hierarchy + origin shield so origin sees a trickle; anycast or geo-DNS routing to the nearest healthy PoP; pull caching with versioned URLs to avoid invalidation, plus a purge control plane when needed; segmented large media.” Hit ratio, routing, and the cache hierarchy are the heart of the answer; eviction and purge mechanics come next.