Designing YouTube
A video platform at planetary scale — the upload/transcode pipeline, adaptive-bitrate streaming from a CDN, metadata, and view-count aggregation.
The problem
Design YouTube: users upload videos; anyone streams them smoothly on any device and network. The defining challenges are the transcoding pipeline (process huge uploads into many formats) and adaptive-bitrate delivery at massive scale from a CDN. Watching dwarfs uploading, so it’s extremely read/bandwidth-heavy.
Step 1 — Requirements
Functional: upload videos; transcode to multiple resolutions/formats; stream with adaptive quality; search; recommendations; likes/comments/views; channels/subscriptions.
Non-functional: massive storage & bandwidth (exabytes, the dominant cost), low startup latency & smooth playback, scale to billions of views/day, global reach, high availability.
Step 2 — Estimate
- Hundreds of hours uploaded per minute; billions of views/day.
- A video is stored as many renditions (resolutions × codecs) → storage is several× the raw upload. → object store + CDN, and transcoding is a huge compute fleet.
Step 3 — The upload + transcode pipeline (the core)
upload (resumable, chunked) → raw object store
→ split into segments → transcode fleet (parallel per segment)
→ renditions: 144p…4K × codecs (H.264/VP9/AV1), packaged as HLS/DASH segments
→ store renditions in object store → distribute to CDN
→ mark video "ready"; index metadata; thumbnails
- Resumable, chunked upload (large files over flaky networks) to a blob store.
- Transcoding is the heavy step: split the video into chunks and transcode them in parallel across a worker fleet into every rendition, then package into segmented formats (HLS/DASH) for adaptive streaming. Async — the uploader’s video goes “ready” when done.
- Thumbnails and metadata extracted alongside.
Step 4 — Adaptive-bitrate streaming + CDN
Playback uses adaptive bitrate (ABR): the video is split into short segments, each available at multiple bitrates; the player monitors bandwidth and switches quality per segment (start low for fast startup, climb up; drop on congestion — no rebuffering).
- Segments are served from a CDN (the bulk of bandwidth) near each viewer; the origin store sees almost nothing.
- Pre-position popular content to the edge; long-tail pulled on demand.
Step 5 — Metadata, search, recommendations
- Metadata (title, channel, description, rendition URLs) in a sharded DB; search via an inverted index (Chapter 4 search lessons).
- Recommendations — a pipeline like TikTok’s (candidate generation + ML ranking) for “up next” / home.
Step 6 — View counts at scale
Counting billions of views can’t be a synchronous DB increment per view (hot-row contention). Stream view events to a queue → aggregate asynchronously into counters (approximate, eventually consistent). Same pattern as likes elsewhere.
Step 7 — Storage tiering
Most views hit a tiny fraction of videos. Tier storage by popularity: hot videos on fast storage + CDN; cold/long-tail on cheaper archival storage, transcoded renditions generated or kept lazily. Saves enormous cost.
Trade-offs to raise
- Pre-transcode all renditions (storage cost, instant playback) vs on-the-fly (compute per view, cheaper storage). Pre-transcode hot content, lazy for cold.
- ABR/segmented (smooth, complex) vs single file (simple, rebuffers).
- CDN cost vs origin offload — CDN is mandatory at this scale.
The interview cue
“Resumable chunked upload → parallel segmented transcoding into many ABR renditions (HLS/DASH) → object store + CDN; players do adaptive bitrate per segment for smooth playback; metadata in sharded DBs with an inverted search index; view counts aggregated async; storage tiered by popularity.” Transcoding pipeline
- ABR/CDN delivery is the heart; implementation next.