Building Netflix
Implement the playback authorization + manifest flow, edge steering to Open Connect appliances, off-peak content pre-positioning, and graceful fallback.
The play flow: cloud authorizes, edge delivers
The control plane (cloud) authorizes playback and hands back a manifest pointing at the best edge appliance; the data plane (Open Connect) streams the bytes:
def play(user, title_id, device):
if not entitled(user, title_id): return 403 # subscription/region check
profile = current_profile(user)
renditions = catalog.renditions(title_id, device) # device-appropriate ABR set
appliance = steer(user, title_id) # pick best Open Connect server
manifest = build_manifest(renditions, base=appliance.url)
bookmarks.touch(profile, title_id) # for resume
return {"manifest": manifest, "drm_license_url": drm.url(user, title_id)}
Edge steering
The steering service picks the appliance that minimizes cost/latency: one inside the user’s ISP if it has the title, else a nearby IXP appliance, considering health and load:
def steer(user, title_id):
candidates = appliances_with(title_id, near=user.ip) # ISP-embedded first
return min(candidates, key=lambda a: (a.in_user_isp() is False, a.load, a.rtt(user)))
Off-peak pre-positioning
Because the catalog and its popularity (by region) are predictable, push content to appliances during off-peak hours rather than pulling on demand:
def nightly_fill(appliance):
predicted = popularity_model.top_titles(region=appliance.region, horizon="24h")
for title in predicted:
for rendition in title.renditions:
if rendition not in appliance:
appliance.prefetch(rendition) # fill during low-traffic window
So at peak, the bytes are already inside the ISP — near-zero backbone traffic.
Per-title encoding (offline)
Encoding runs once per title, tuned to the content (a cartoon needs far less bitrate than an action scene at the same quality). Output is the full ABR ladder as segmented HLS/DASH, stored and distributed to appliances. No realtime transcode path.
Recommendations and browse
The home page is assembled by a recommendation service (candidate generation + ML ranking per profile) producing personalized rows; catalog/search are microservices. These are cloud-side, cached aggressively, and independent of the streaming path.
Resilience: degrade, don’t fail
def home(profile):
try:
rows = recommendation_service.rows(profile) # personalized
except ServiceDown:
rows = fallback_rows(profile.region) # popular/curated default
return rows
Every dependency has a fallback (a stale or generic response) behind a circuit breaker, so one service failing degrades gracefully instead of cascading — validated by chaos testing (killing instances in production).
Scale and failure handling
- Streaming load → served from edge appliances; the cloud handles only control-plane requests (small).
- Appliance failure → steer to the next-best; content is replicated across appliances.
- Region/service outage → multi-region failover; fallbacks keep browse/playback working.
- New release spike → pre-position widely before launch.
The takeaway
Concrete signals: cloud control plane authorizes + steers, Open Connect appliances deliver with off-peak pre-positioning inside ISPs, per-title offline encoding, ML recommendation rows, and circuit-breaker fallbacks proven by chaos testing. Pushing a known catalog to embedded edges (vs YouTube’s pull/transcode) is the efficiency play.