Designing TikTok
A short-video app whose feed is driven by recommendation, not who you follow — the ML "For You" pipeline, video ingestion, and engagement-signal loop.
The problem
Design TikTok: users upload short videos and scroll an endless “For You” feed. The twist versus Instagram/Twitter: the feed is recommendation-driven, not built from a follow graph — the system decides what to show from the entire catalog based on predicted engagement. So the crux is a recommendation pipeline plus video ingestion at scale.
Step 1 — Requirements
Functional: upload short videos; a personalized For You feed (infinite scroll); likes/comments/shares/follows; the feed adapts quickly to your behavior.
Non-functional: extremely read/engagement-heavy, low-latency video start (instant playback), massive video storage/bandwidth, fast personalization (react to signals within a session), scale to billions of views/day.
Step 2 — Video ingestion (reuse + transcode)
Same media plane as Instagram, plus heavy transcoding:
- Upload to object store via pre-signed URL.
- Transcode to multiple resolutions/bitrates and segment for adaptive bitrate streaming (HLS/DASH) so playback adapts to network speed.
- Serve via CDN; prefetch the next few videos so the next swipe plays instantly.
Step 3 — The recommendation feed (the core)
Unlike a follow-graph feed, candidates come from the whole catalog:
- Candidate generation — pull a pool of candidate videos from many sources: trending, similar-to-liked (embedding nearest-neighbors), same creators/sounds, fresh content needing exposure. Uses ANN search over video/user embeddings.
- Ranking — an ML model predicts engagement (watch-time, like, share, completion, rewatch) for each candidate for this user, and orders them.
- Re-ranking / diversity — avoid repetition, inject exploration (new content), apply business/safety rules.
It’s the two-stage recall → rank pattern again (search, news feed), but the recall is recommendation (embeddings/ANN), not fan-out.
Step 4 — The engagement feedback loop (what makes it “scary good”)
Signals flow back fast: watch time, rewatches, swipe-aways, likes — streamed into feature updates and short-term user-interest vectors within the session, so the very next batch of recommendations reflects what you just did. This tight loop is the product.
Step 5 — Architecture
upload → object store → transcode (variants + HLS segments) → CDN
view → recommendation service: candidate gen (ANN over embeddings) → ML ranking → feed
engagement events → stream → feature store / interest vectors → next recommendations
- Recommendation service (candidate gen + ranking) backed by a feature store and embedding/ANN index.
- Engagement stream (Kafka) continuously updates features and trains models.
Step 6 — Scale
- Video in object store + CDN with prefetch; transcoding is a big async fleet.
- Embeddings/ANN index sharded; feature store low-latency.
- Engagement events are enormous — stream-processed (batch for training, stream for real-time interest).
Trade-offs to raise
- Recommendation recall (whole-catalog ANN) vs follow-graph fan-out — TikTok chooses recommendation, which is why new creators can go viral.
- Exploration vs exploitation — must show some new/uncertain content to learn and to give creators reach, at the cost of some short-term engagement.
- Real-time personalization vs cost — tighter loops cost more stream/compute.
The interview cue
“Video ingested like Instagram but transcoded to ABR segments + CDN with prefetch for instant playback; the For You feed is a recommendation pipeline — candidate generation via ANN over embeddings (not a follow graph), ML ranking on predicted watch-time/engagement, diversity re-ranking — with a real-time engagement loop updating interest within the session.” Recommendation recall + ranking + fast feedback is the defining answer; implementation next.