Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·7 min read

Building TikTok

Implement ANN candidate retrieval over embeddings, the engagement-prediction ranker, video prefetch for instant playback, and the real-time interest loop.


The recommendation read path

def for_you(user_id, k=10):
    user_vec = interest_store.get(user_id)              # short+long-term interest embedding
    candidates = []
    candidates += ann_index.search(user_vec, n=500)     # similar to what you engage with
    candidates += trending_pool(region(user_id), n=200) # trending
    candidates += exploration_pool(n=100)               # fresh/uncertain content (learn + creator reach)
    candidates = dedupe_unseen(candidates, user_id)
    features = feature_store.batch(user_id, candidates)
    scored = ranking_model.predict(features)            # P(watch-through, like, share, rewatch)
    ranked = diversity_rerank(sort_desc(scored))        # avoid same-creator/sound repetition
    return ranked[:k]

The recall is ANN over embeddings (not fan-out): represent users and videos as vectors and find nearest neighbors. That’s how content from creators you don’t follow reaches you.

ANN candidate retrieval

Videos and users are embedded into the same vector space (from a trained model); approximate nearest-neighbor search (HNSW/IVF, e.g. FAISS) finds candidate videos close to the user’s interest vector in milliseconds:

# offline: embed every video; build/refresh a sharded ANN index
# online:  ann_index.search(user_vec, n) → top-n similar video ids

Ranking and diversity

The ranker scores predicted engagement (heavily weighting watch time / completion, since that’s TikTok’s signal) over features (affinity, freshness, video stats, your recent behavior). A diversity/re-rank pass prevents five clips of the same sound in a row and injects exploration.

Instant playback via prefetch

The killer UX detail: the next video plays the instant you swipe. The client prefetches the next few recommended videos’ first segments while you watch the current one:

def serve_feed(user_id):
    feed = for_you(user_id, k=10)
    return [{"id": v.id, "manifest": cdn_hls(v), "prefetch": True} for v in feed[:3]]

Videos are ABR-segmented on the CDN, so playback adapts to bandwidth and starts from a small first segment.

The real-time interest loop

Every engagement event updates the user’s short-term interest vector almost immediately, so the next batch reflects the current session:

def on_engagement(user_id, video_id, signal):          # watch_time, like, skip, share
    event_stream.publish({"user": user_id, "video": video_id, "signal": signal})
# stream processor: update interest_store[user_id] (decayed blend) within seconds

This tight loop — watch a few cooking videos, get more — is the product’s magic.

Video ingestion

Same as Instagram plus transcoding: upload → object store → async transcode to ABR renditions + HLS segments → CDN. New videos enter the exploration pool to gather initial signals before the model can rank them confidently (the cold-start path).

Scale and failure handling

  • Engagement events are massive → stream-processed (Kafka + Flink); batch for model training, stream for real-time interest.
  • ANN index sharded and replicated; refreshed as embeddings update.
  • Video on CDN with prefetch; transcoding is an autoscaled fleet.
  • Ranking service down → fall back to trending/popular (degrade gracefully).
  • Cold-start (new user/video) → lean on trending + exploration until signals accumulate.

The takeaway

Concrete signals: ANN-over-embeddings candidate generation (whole-catalog recall, not follow-graph), watch-time-centric ML ranking + diversity, CDN ABR + prefetch for instant playback, and a real-time engagement loop updating interest within the session. The recall mechanism is the key difference from the social feeds — recommendation, not who you follow.