Skip to content
System design course
Ch.4 · Designing real systems·concept ·8 min read

Designing TikTok

A short-video app whose feed is driven by recommendation, not who you follow — the ML "For You" pipeline, video ingestion, and engagement-signal loop.


The problem

Design TikTok: users upload short videos and scroll an endless “For You” feed. The twist versus Instagram/Twitter: the feed is recommendation-driven, not built from a follow graph — the system decides what to show from the entire catalog based on predicted engagement. So the crux is a recommendation pipeline plus video ingestion at scale.

Step 1 — Requirements

Functional: upload short videos; a personalized For You feed (infinite scroll); likes/comments/shares/follows; the feed adapts quickly to your behavior.

Non-functional: extremely read/engagement-heavy, low-latency video start (instant playback), massive video storage/bandwidth, fast personalization (react to signals within a session), scale to billions of views/day.

Step 2 — Video ingestion (reuse + transcode)

Same media plane as Instagram, plus heavy transcoding:

  • Upload to object store via pre-signed URL.
  • Transcode to multiple resolutions/bitrates and segment for adaptive bitrate streaming (HLS/DASH) so playback adapts to network speed.
  • Serve via CDN; prefetch the next few videos so the next swipe plays instantly.

Step 3 — The recommendation feed (the core)

Unlike a follow-graph feed, candidates come from the whole catalog:

  1. Candidate generation — pull a pool of candidate videos from many sources: trending, similar-to-liked (embedding nearest-neighbors), same creators/sounds, fresh content needing exposure. Uses ANN search over video/user embeddings.
  2. Ranking — an ML model predicts engagement (watch-time, like, share, completion, rewatch) for each candidate for this user, and orders them.
  3. Re-ranking / diversity — avoid repetition, inject exploration (new content), apply business/safety rules.

It’s the two-stage recall → rank pattern again (search, news feed), but the recall is recommendation (embeddings/ANN), not fan-out.

Step 4 — The engagement feedback loop (what makes it “scary good”)

Signals flow back fast: watch time, rewatches, swipe-aways, likes — streamed into feature updates and short-term user-interest vectors within the session, so the very next batch of recommendations reflects what you just did. This tight loop is the product.

Step 5 — Architecture

upload → object store → transcode (variants + HLS segments) → CDN
view  → recommendation service: candidate gen (ANN over embeddings) → ML ranking → feed
engagement events → stream → feature store / interest vectors → next recommendations
  • Recommendation service (candidate gen + ranking) backed by a feature store and embedding/ANN index.
  • Engagement stream (Kafka) continuously updates features and trains models.

Step 6 — Scale

  • Video in object store + CDN with prefetch; transcoding is a big async fleet.
  • Embeddings/ANN index sharded; feature store low-latency.
  • Engagement events are enormous — stream-processed (batch for training, stream for real-time interest).

Trade-offs to raise

  • Recommendation recall (whole-catalog ANN) vs follow-graph fan-out — TikTok chooses recommendation, which is why new creators can go viral.
  • Exploration vs exploitation — must show some new/uncertain content to learn and to give creators reach, at the cost of some short-term engagement.
  • Real-time personalization vs cost — tighter loops cost more stream/compute.

The interview cue

“Video ingested like Instagram but transcoded to ABR segments + CDN with prefetch for instant playback; the For You feed is a recommendation pipelinecandidate generation via ANN over embeddings (not a follow graph), ML ranking on predicted watch-time/engagement, diversity re-ranking — with a real-time engagement loop updating interest within the session.” Recommendation recall + ranking + fast feedback is the defining answer; implementation next.