Building YouTube · Lyte Code

Implement the parallel segment-transcode DAG, ABR packaging and manifests, the CDN-served playback path, and async view counting.

Resumable chunked upload

Large videos upload in chunks (resume on failure), straight to the blob store:

def upload_chunk(video_id, index, data, checksum):
    assert sha256(data) == checksum                 # integrity per chunk
    blob_store.put(f"raw/{video_id}/{index}", data)
    progress.mark(video_id, index)
    if progress.complete(video_id):
        transcode_queue.publish({"video_id": video_id})   # kick off processing

The transcode pipeline (parallel DAG)

Split the raw video into segments and transcode them in parallel across a worker fleet — the slow step, made fast by fan-out:

def transcode(video_id):
    raw = assemble(blob_store.list(f"raw/{video_id}"))
    segments = split(raw, seconds=4)                # GOP-aligned segments
    jobs = []
    for seg in segments:
        for rendition in RENDITIONS:                # 144p..4K × {H.264, VP9, AV1}
            jobs.append(("transcode", seg, rendition))
    run_parallel(jobs, fleet=transcode_workers)     # thousands of small jobs
    package_hls_dash(video_id)                       # manifests + segment files
    cdn.distribute(video_id)
    metadata.set_status(video_id, "ready")           # now playable

Each job is small and independent → the fleet processes a long video in the time of its longest segment, not its full length.

ABR packaging and manifests

Package the renditions into HLS/DASH: a manifest lists each quality’s segment URLs; the player picks segments adaptively:

master.m3u8         # lists variants: 144p,360p,720p,1080p,4K
  720p/index.m3u8   # segment list: seg0.ts, seg1.ts, ...
  1080p/index.m3u8

The playback path (CDN + adaptive bitrate)

def play(video_id):
    meta = metadata.get(video_id)
    if meta.status != "ready": return 425           # still processing
    return {"manifest": cdn_url(f"{video_id}/master.m3u8")}   # player takes over via CDN
# client: fetch manifest → start at a low bitrate (fast start) →
#         measure bandwidth/buffer → step quality up/down per segment

Segments stream from the CDN; the origin store is barely touched. The player’s ABR logic starts low (instant start), climbs to the best sustainable quality, and drops on congestion (no rebuffer).

Async view counting

def on_view(video_id, user, position):
    view_stream.publish({"video": video_id, "ts": now(), "watch_pct": position})
# stream/batch workers → increment counters in a time-series store (approximate)
# a view is counted only after N seconds watched (anti-fraud)

Never UPDATE videos SET views = views + 1 per view — aggregate the stream.

Storage and metadata

Renditions in the object store (erasure-coded), fronted by CDN; tier cold videos to cheaper storage and lazily (re)transcode rare ones.
Metadata sharded by video id; search index updated async; thumbnails in the blob store + CDN.

Scale and failure handling

Transcode fleet autoscales on queue depth; a failed segment job retries (idempotent by segment+rendition).
Hot video (viral) → CDN absorbs it; pre-position to more edges.
CDN miss / cold long-tail → pulled from origin once, then cached.
Playback on bad network → ABR drops quality; segments are small and retryable.
Upload failure → resumable chunks restart from the missing piece.

The takeaway

Concrete signals: chunked resumable upload, parallel per-segment transcoding into ABR renditions (HLS/DASH), CDN-served adaptive playback, async view counting, and popularity-tiered storage. The transcode-fan-out + ABR/CDN delivery is the reusable video backbone — Netflix is the curated-catalog variation.