Building Netflix · Lyte Code

Implement the playback authorization + manifest flow, edge steering to Open Connect appliances, off-peak content pre-positioning, and graceful fallback.

The play flow: cloud authorizes, edge delivers

The control plane (cloud) authorizes playback and hands back a manifest pointing at the best edge appliance; the data plane (Open Connect) streams the bytes:

def play(user, title_id, device):
    if not entitled(user, title_id): return 403          # subscription/region check
    profile = current_profile(user)
    renditions = catalog.renditions(title_id, device)    # device-appropriate ABR set
    appliance = steer(user, title_id)                    # pick best Open Connect server
    manifest = build_manifest(renditions, base=appliance.url)
    bookmarks.touch(profile, title_id)                   # for resume
    return {"manifest": manifest, "drm_license_url": drm.url(user, title_id)}

Edge steering

The steering service picks the appliance that minimizes cost/latency: one inside the user’s ISP if it has the title, else a nearby IXP appliance, considering health and load:

def steer(user, title_id):
    candidates = appliances_with(title_id, near=user.ip)  # ISP-embedded first
    return min(candidates, key=lambda a: (a.in_user_isp() is False, a.load, a.rtt(user)))

Off-peak pre-positioning

Because the catalog and its popularity (by region) are predictable, push content to appliances during off-peak hours rather than pulling on demand:

def nightly_fill(appliance):
    predicted = popularity_model.top_titles(region=appliance.region, horizon="24h")
    for title in predicted:
        for rendition in title.renditions:
            if rendition not in appliance:
                appliance.prefetch(rendition)             # fill during low-traffic window

So at peak, the bytes are already inside the ISP — near-zero backbone traffic.

Per-title encoding (offline)

Encoding runs once per title, tuned to the content (a cartoon needs far less bitrate than an action scene at the same quality). Output is the full ABR ladder as segmented HLS/DASH, stored and distributed to appliances. No realtime transcode path.

Recommendations and browse

The home page is assembled by a recommendation service (candidate generation + ML ranking per profile) producing personalized rows; catalog/search are microservices. These are cloud-side, cached aggressively, and independent of the streaming path.

Resilience: degrade, don’t fail

def home(profile):
    try:
        rows = recommendation_service.rows(profile)       # personalized
    except ServiceDown:
        rows = fallback_rows(profile.region)              # popular/curated default
    return rows

Every dependency has a fallback (a stale or generic response) behind a circuit breaker, so one service failing degrades gracefully instead of cascading — validated by chaos testing (killing instances in production).

Scale and failure handling

Streaming load → served from edge appliances; the cloud handles only control-plane requests (small).
Appliance failure → steer to the next-best; content is replicated across appliances.
Region/service outage → multi-region failover; fallbacks keep browse/playback working.
New release spike → pre-position widely before launch.

The takeaway

Concrete signals: cloud control plane authorizes + steers, Open Connect appliances deliver with off-peak pre-positioning inside ISPs, per-title offline encoding, ML recommendation rows, and circuit-breaker fallbacks proven by chaos testing. Pushing a known catalog to embedded edges (vs YouTube’s pull/transcode) is the efficiency play.