Designing Spotify
Music streaming — audio delivery from a CDN with instant start, playlists and library at scale, and recommendation systems like Discover Weekly.
The problem
Design Spotify: stream a catalog of ~100M tracks with instant playback, manage playlists/libraries, and surface great recommendations (Discover Weekly, radio). Audio files are small versus video, so the challenges shift to instant start, playlist/social data at scale, and recommendations.
Step 1 — Requirements
Functional: search and play tracks; create/follow playlists; save a library; recommendations and radio; offline downloads; show “now playing”/social.
Non-functional: very low playback-start latency (music should start the instant you tap), read-heavy, scalable, available; audio quality adapts to network; gapless playback.
Step 2 — Audio storage and delivery
- Each track is pre-encoded into a few bitrates (e.g. 96/160/320 kbps, Ogg/AAC), stored in an object store, served via CDN — same media plane as before, but files are KB–MB, not GB.
- Tracks are immutable → trivially cacheable; the CDN handles nearly all delivery.
Step 3 — Instant playback (the UX core)
Music users expect zero perceptible startup delay. Techniques:
- Predictive prefetching — prefetch the next track in the queue/playlist (and the start of likely-next tracks) while the current plays, so skips and track changes are instant.
- Stream the first chunk immediately at a lower bitrate, then continue — start playing before the whole file arrives.
- Edge caching — popular tracks sit on CDN edges near the user.
Step 4 — Playlists, library, social (the data side)
This is where the volume is — billions of playlists, follows, saved tracks:
- Playlists — a playlist is an ordered list of track ids (plus metadata); the track audio is referenced, never duplicated. Sharded by playlist/user id.
- Library / saved tracks / follows — user-centric data, sharded by user; read-heavy.
- Collaborative playlists — concurrent edits reconciled (ordering, like a lightweight version of the collaborative-editing problem).
Step 5 — Recommendations
A major product surface:
- Collaborative filtering — “users who listen to X also listen to Y” from the massive listening matrix.
- Content/audio analysis — embeddings from audio features and NLP over playlists/ reviews.
- Discover Weekly / radio — batch pipelines (like the news-feed/TikTok recall+rank) generating personalized track sets.
Step 6 — Architecture
play → API → playback authz → CDN audio (with prefetch of next track)
browse → catalog + search (inverted index) + recommendation service
playlists/library → sharded DBs (ordered track-id lists), heavily cached
listening events → stream → recommendation features + royalty accounting
Step 7 — Listening events (analytics + royalties)
Every play is an event streamed to a pipeline that powers recommendations and royalty accounting (who gets paid per stream) — high-volume, processed async/batched (reuse the analytics platform).
Trade-offs to raise
- Prefetch next track (instant skips, wasted bandwidth on skips-not-taken) vs on-demand. Prefetch wins for UX.
- Audio bitrate vs bandwidth/quality — adaptive by network/plan.
- Batch recommendations (cheap, fresh-weekly) vs realtime (radio). Use both.
The interview cue
“Tracks pre-encoded into a few bitrates in an object store + CDN (immutable, trivially cached); predictive prefetch of the next track + first-chunk streaming for instant playback; playlists/library as sharded ordered track-id lists, heavily cached; recommendations via collaborative filtering + audio embeddings (Discover Weekly as a batch pipeline); listening events streamed for recs + royalties.” Instant-start delivery + playlist data + recommendations is the answer; implementation next.