Skip to content
System design course
Ch.4 · Designing real systems·concept ·8 min read

Designing Netflix

Streaming a curated catalog — pre-encoded content, an ISP-embedded CDN for efficiency, personalized recommendations, and graceful global delivery.


The problem

Design Netflix: stream a curated catalog of movies/shows to millions concurrently, with great quality, instant start, and strong recommendations. Unlike YouTube, there are no user uploads — a known, finite catalog you fully control. That changes the design: you pre-encode everything optimally and obsess over delivery efficiency and personalization.

Step 1 — Requirements

Functional: browse/search a catalog; stream with adaptive quality; personalized recommendations and rows; resume across devices; profiles; downloads.

Non-functional: massive read bandwidth (video is ~15% of global internet traffic), low startup + zero rebuffering, high availability, global reach, personalization.

Step 2 — The catalog is known: pre-encode everything

Because the catalog is finite and controlled, encode each title ahead of time into many renditions — and do it per-title optimally (per-title/per-scene encoding tunes bitrate to the content, saving bandwidth at the same quality). Store all renditions as ABR segments (HLS/DASH) ready to serve. No real-time transcoding pressure like YouTube.

Step 3 — Delivery: a custom CDN (Open Connect)

Delivery efficiency is the whole game at this bandwidth. Netflix runs Open Connect — its own CDN with appliances embedded inside ISPs (and IXPs):

  • Popular content is pre-positioned onto these appliances during off-peak hours.
  • A viewer streams from an appliance inside their ISP — minimal backbone traffic, lowest latency, huge cost savings.
  • The control plane steers each client to the best appliance (health, fill, proximity).

This “push content to the edge ahead of demand” is the key insight versus a pull CDN.

Step 4 — Adaptive streaming

Same ABR as YouTube: short segments at multiple bitrates; the player starts low for instant playback and adapts per segment to bandwidth — no rebuffering. Downloads cache encrypted segments for offline.

Step 5 — Recommendations (a core product)

Netflix is famously recommendation-driven — most viewing comes from recommendations, not search. A pipeline (candidate generation + ML ranking over viewing history, similarity, context) personalizes rows and ranking per profile; even artwork/thumbnails are personalized and A/B tested.

Step 6 — Architecture (control vs data plane)

client → API (AWS): browse, search, recommendations, playback authz, bookmarks
       → Open Connect (CDN in ISPs): the actual video segments

Netflix famously splits a control plane in the cloud (microservices: catalog, recommendations, billing, playback authorization) from the data plane on Open Connect (the bytes). The cloud decides what and where; the appliances deliver.

Step 7 — Resilience

Netflix pioneered chaos engineering (Chaos Monkey) — deliberately killing instances to prove the system degrades gracefully. Microservices with circuit breakers, fallbacks, and multi-region failover keep streaming up even when components fail.

Trade-offs to raise

  • Pre-encode optimally (storage, no realtime compute, best quality/bitrate) vs on-the-fly (YouTube’s path for unknown uploads). Curated catalog → pre-encode.
  • Custom embedded CDN (huge efficiency, operational complexity) vs third-party CDN.
  • Recommendation-driven (engagement) vs search-driven discovery.

The interview cue

“Finite catalog → pre-encode per-title-optimally into ABR segments; deliver via a custom CDN embedded in ISPs (Open Connect) with content pushed to edges off-peak; a cloud control plane (microservices) handles browse/recommendations/playback authz while appliances serve bytes; ML recommendations drive discovery; chaos-tested resilience.” Pre-encoding + edge-embedded CDN + control/data split is the distinguishing answer; implementation next.