Designing Spotify

Music streaming — audio delivery from a CDN with instant start, playlists and library at scale, and recommendation systems like Discover Weekly.

The problem

Design Spotify: stream a catalog of ~100M tracks with instant playback, manage playlists/libraries, and surface great recommendations (Discover Weekly, radio). Audio files are small versus video, so the challenges shift to instant start, playlist/social data at scale, and recommendations.

Step 1 — Requirements

Functional: search and play tracks; create/follow playlists; save a library; recommendations and radio; offline downloads; show “now playing”/social.

Non-functional: very low playback-start latency (music should start the instant you tap), read-heavy, scalable, available; audio quality adapts to network; gapless playback.

Step 2 — Audio storage and delivery

Each track is pre-encoded into a few bitrates (e.g. 96/160/320 kbps, Ogg/AAC), stored in an object store, served via CDN — same media plane as before, but files are KB–MB, not GB.
Tracks are immutable → trivially cacheable; the CDN handles nearly all delivery.

Step 3 — Instant playback (the UX core)

Music users expect zero perceptible startup delay. Techniques:

Predictive prefetching — prefetch the next track in the queue/playlist (and the start of likely-next tracks) while the current plays, so skips and track changes are instant.
Stream the first chunk immediately at a lower bitrate, then continue — start playing before the whole file arrives.
Edge caching — popular tracks sit on CDN edges near the user.

This is where the volume is — billions of playlists, follows, saved tracks:

Playlists — a playlist is an ordered list of track ids (plus metadata); the track audio is referenced, never duplicated. Sharded by playlist/user id.
Library / saved tracks / follows — user-centric data, sharded by user; read-heavy.
Collaborative playlists — concurrent edits reconciled (ordering, like a lightweight version of the collaborative-editing problem).

Step 5 — Recommendations

A major product surface:

Collaborative filtering — “users who listen to X also listen to Y” from the massive listening matrix.
Content/audio analysis — embeddings from audio features and NLP over playlists/ reviews.
Discover Weekly / radio — batch pipelines (like the news-feed/TikTok recall+rank) generating personalized track sets.

Step 6 — Architecture

play → API → playback authz → CDN audio (with prefetch of next track)
browse → catalog + search (inverted index) + recommendation service
playlists/library → sharded DBs (ordered track-id lists), heavily cached
listening events → stream → recommendation features + royalty accounting

Step 7 — Listening events (analytics + royalties)

Every play is an event streamed to a pipeline that powers recommendations and royalty accounting (who gets paid per stream) — high-volume, processed async/batched (reuse the analytics platform).

Trade-offs to raise

Prefetch next track (instant skips, wasted bandwidth on skips-not-taken) vs on-demand. Prefetch wins for UX.
Audio bitrate vs bandwidth/quality — adaptive by network/plan.
Batch recommendations (cheap, fresh-weekly) vs realtime (radio). Use both.

The interview cue

“Tracks pre-encoded into a few bitrates in an object store + CDN (immutable, trivially cached); predictive prefetch of the next track + first-chunk streaming for instant playback; playlists/library as sharded ordered track-id lists, heavily cached; recommendations via collaborative filtering + audio embeddings (Discover Weekly as a batch pipeline); listening events streamed for recs + royalties.” Instant-start delivery + playlist data + recommendations is the answer; implementation next.