Building Spotify
Implement the instant-start playback with next-track prefetch, playlist storage as ordered track-id lists, and the listening-event pipeline.
Playback with instant start + prefetch
Start the current track from a low-bitrate first chunk and prefetch the next track so skips/changes are instant:
def play(user, track_id, context): # context = playlist/queue
if not entitled(user, track_id): return 403
bitrate = choose_bitrate(user.plan, network())
url = cdn_url(track_audio(track_id, bitrate))
prefetch_next(context) # warm the next track(s)
listening_events.publish({"user": user.id, "track": track_id, "ts": now()})
return {"stream_url": url, "start_chunk": cdn_url(track_first_chunk(track_id, bitrate))}
def prefetch_next(context):
for nxt in context.upcoming(n=2): # next 1-2 tracks
client.prefetch(cdn_url(track_first_chunk(nxt, default_bitrate)))
First-chunk + prefetch is what makes playback and skips feel instantaneous.
Audio storage
Each track → a few pre-encoded bitrates in the object store, fronted by the CDN. Tracks are immutable and content-addressed, so caching is trivial (no invalidation) and identical files dedupe. The CDN serves nearly all audio bytes.
Playlists as ordered track-id lists
A playlist references tracks; it never stores audio:
# playlist: { id, owner, name, tracks: [track_id, ...], updated_at, collaborative }
def add_to_playlist(playlist_id, track_id, pos, editor):
pl = playlists.get(playlist_id)
authorize(editor, pl)
pl.tracks.insert(pos, track_id) # ordered list of ids
playlists.put(playlist_id, pl) # sharded by playlist/user id
Library, saved tracks, and follows are similarly user-sharded, read-heavy, and cached. A collaborative playlist reconciles concurrent edits with versioning/OT-lite ordering.
Hydrating a playlist for display
Reading a playlist returns track ids; hydrate track metadata in a batched cache read (the metadata is hot and small):
def view_playlist(playlist_id):
pl = playlists.get(playlist_id)
tracks = track_metadata.batch_get(pl.tracks) # cached; audio fetched only on play
return {"name": pl.name, "tracks": tracks}
Listening events (recs + royalties)
# every play streams an event:
# { user, track, artist, ts, ms_played, context }
# → stream processor:
# - update recommendation features (collaborative-filtering matrix, embeddings)
# - count a "stream" for ROYALTIES once ms_played > 30s (the payable threshold)
# - batch into the data warehouse for Discover Weekly (weekly batch job)
Counting/royalties are async and exactly-once-sensitive (money) — dedup by event id and reconcile in batch.
Recommendations
- Realtime radio / autoplay — extend the queue using nearest-neighbor over track embeddings (ANN, like TikTok recall).
- Discover Weekly — a weekly batch pipeline: collaborative filtering + audio/NLP embeddings produce a personalized 30-track set per user, materialized to a playlist.
Scale and failure handling
- Audio → CDN absorbs it; immutable + cached, minimal origin load.
- Playlist/library DBs → sharded by user/playlist, cached; hot for reads.
- CDN miss (obscure track) → one origin fetch, then cached.
- Prefetch waste (user skips a prefetched track) → acceptable cost for instant UX; bound prefetch to 1–2 tracks.
- Event pipeline lag → recs/royalties trail slightly (fine); royalties reconciled in batch for accuracy.
The takeaway
Concrete signals: first-chunk + next-track prefetch for instant playback, immutable audio on CDN, playlists/library as sharded ordered id-lists (audio referenced, not copied), and a listening-event pipeline feeding recommendations and threshold-based royalty accounting. Small files shift the focus from transcoding to instant-start delivery and playlist/recommendation data.