Building YouTube
Implement the parallel segment-transcode DAG, ABR packaging and manifests, the CDN-served playback path, and async view counting.
Resumable chunked upload
Large videos upload in chunks (resume on failure), straight to the blob store:
def upload_chunk(video_id, index, data, checksum):
assert sha256(data) == checksum # integrity per chunk
blob_store.put(f"raw/{video_id}/{index}", data)
progress.mark(video_id, index)
if progress.complete(video_id):
transcode_queue.publish({"video_id": video_id}) # kick off processing
The transcode pipeline (parallel DAG)
Split the raw video into segments and transcode them in parallel across a worker fleet — the slow step, made fast by fan-out:
def transcode(video_id):
raw = assemble(blob_store.list(f"raw/{video_id}"))
segments = split(raw, seconds=4) # GOP-aligned segments
jobs = []
for seg in segments:
for rendition in RENDITIONS: # 144p..4K × {H.264, VP9, AV1}
jobs.append(("transcode", seg, rendition))
run_parallel(jobs, fleet=transcode_workers) # thousands of small jobs
package_hls_dash(video_id) # manifests + segment files
cdn.distribute(video_id)
metadata.set_status(video_id, "ready") # now playable
Each job is small and independent → the fleet processes a long video in the time of its longest segment, not its full length.
ABR packaging and manifests
Package the renditions into HLS/DASH: a manifest lists each quality’s segment URLs; the player picks segments adaptively:
master.m3u8 # lists variants: 144p,360p,720p,1080p,4K
720p/index.m3u8 # segment list: seg0.ts, seg1.ts, ...
1080p/index.m3u8
The playback path (CDN + adaptive bitrate)
def play(video_id):
meta = metadata.get(video_id)
if meta.status != "ready": return 425 # still processing
return {"manifest": cdn_url(f"{video_id}/master.m3u8")} # player takes over via CDN
# client: fetch manifest → start at a low bitrate (fast start) →
# measure bandwidth/buffer → step quality up/down per segment
Segments stream from the CDN; the origin store is barely touched. The player’s ABR logic starts low (instant start), climbs to the best sustainable quality, and drops on congestion (no rebuffer).
Async view counting
def on_view(video_id, user, position):
view_stream.publish({"video": video_id, "ts": now(), "watch_pct": position})
# stream/batch workers → increment counters in a time-series store (approximate)
# a view is counted only after N seconds watched (anti-fraud)
Never UPDATE videos SET views = views + 1 per view — aggregate the stream.
Storage and metadata
- Renditions in the object store (erasure-coded), fronted by CDN; tier cold videos to cheaper storage and lazily (re)transcode rare ones.
- Metadata sharded by video id; search index updated async; thumbnails in the blob store + CDN.
Scale and failure handling
- Transcode fleet autoscales on queue depth; a failed segment job retries (idempotent by segment+rendition).
- Hot video (viral) → CDN absorbs it; pre-position to more edges.
- CDN miss / cold long-tail → pulled from origin once, then cached.
- Playback on bad network → ABR drops quality; segments are small and retryable.
- Upload failure → resumable chunks restart from the missing piece.
The takeaway
Concrete signals: chunked resumable upload, parallel per-segment transcoding into ABR renditions (HLS/DASH), CDN-served adaptive playback, async view counting, and popularity-tiered storage. The transcode-fan-out + ABR/CDN delivery is the reusable video backbone — Netflix is the curated-catalog variation.