Designing YouTube · Lyte Code

A video platform at planetary scale — the upload/transcode pipeline, adaptive-bitrate streaming from a CDN, metadata, and view-count aggregation.

The problem

Design YouTube: users upload videos; anyone streams them smoothly on any device and network. The defining challenges are the transcoding pipeline (process huge uploads into many formats) and adaptive-bitrate delivery at massive scale from a CDN. Watching dwarfs uploading, so it’s extremely read/bandwidth-heavy.

Step 1 — Requirements

Functional: upload videos; transcode to multiple resolutions/formats; stream with adaptive quality; search; recommendations; likes/comments/views; channels/subscriptions.

Non-functional: massive storage & bandwidth (exabytes, the dominant cost), low startup latency & smooth playback, scale to billions of views/day, global reach, high availability.

Step 2 — Estimate

Hundreds of hours uploaded per minute; billions of views/day.
A video is stored as many renditions (resolutions × codecs) → storage is several× the raw upload. → object store + CDN, and transcoding is a huge compute fleet.

Step 3 — The upload + transcode pipeline (the core)

upload (resumable, chunked) → raw object store
   → split into segments → transcode fleet (parallel per segment)
   → renditions: 144p…4K × codecs (H.264/VP9/AV1), packaged as HLS/DASH segments
   → store renditions in object store → distribute to CDN
   → mark video "ready"; index metadata; thumbnails

Resumable, chunked upload (large files over flaky networks) to a blob store.
Transcoding is the heavy step: split the video into chunks and transcode them in parallel across a worker fleet into every rendition, then package into segmented formats (HLS/DASH) for adaptive streaming. Async — the uploader’s video goes “ready” when done.
Thumbnails and metadata extracted alongside.

Step 4 — Adaptive-bitrate streaming + CDN

Playback uses adaptive bitrate (ABR): the video is split into short segments, each available at multiple bitrates; the player monitors bandwidth and switches quality per segment (start low for fast startup, climb up; drop on congestion — no rebuffering).

Segments are served from a CDN (the bulk of bandwidth) near each viewer; the origin store sees almost nothing.
Pre-position popular content to the edge; long-tail pulled on demand.

Step 5 — Metadata, search, recommendations

Metadata (title, channel, description, rendition URLs) in a sharded DB; search via an inverted index (Chapter 4 search lessons).
Recommendations — a pipeline like TikTok’s (candidate generation + ML ranking) for “up next” / home.

Step 6 — View counts at scale

Counting billions of views can’t be a synchronous DB increment per view (hot-row contention). Stream view events to a queue → aggregate asynchronously into counters (approximate, eventually consistent). Same pattern as likes elsewhere.

Step 7 — Storage tiering

Most views hit a tiny fraction of videos. Tier storage by popularity: hot videos on fast storage + CDN; cold/long-tail on cheaper archival storage, transcoded renditions generated or kept lazily. Saves enormous cost.

Trade-offs to raise

Pre-transcode all renditions (storage cost, instant playback) vs on-the-fly (compute per view, cheaper storage). Pre-transcode hot content, lazy for cold.
ABR/segmented (smooth, complex) vs single file (simple, rebuffers).
CDN cost vs origin offload — CDN is mandatory at this scale.

The interview cue

“Resumable chunked upload → parallel segmented transcoding into many ABR renditions (HLS/DASH) → object store + CDN; players do adaptive bitrate per segment for smooth playback; metadata in sharded DBs with an inverted search index; view counts aggregated async; storage tiered by popularity.” Transcoding pipeline

ABR/CDN delivery is the heart; implementation next.