Designing Instagram · Lyte Code

Photo sharing at scale — media upload to a blob store and CDN, a fan-out feed like Twitter's, and the storage split between metadata and images.

The problem

Design Instagram: users upload photos/videos, follow others, and see a feed of recent posts from people they follow, plus profiles and likes/comments. It combines the Twitter feed (fan-out) with large media handling (the new ingredient here).

Step 1 — Requirements

Functional: upload a photo (with caption); follow users; view a home feed of followees’ posts; view profiles; like/comment. (Stories/explore optional.)

Non-functional: read-heavy, low-latency feed and image load, durable media storage, scale (billions of photos), eventual consistency is fine.

Step 2 — Estimate

Say 500M photos/day → ~6k writes/sec. Feed/image reads are far higher.
Average photo ~2 MB → 500M × 2 MB ≈ 1 PB/day of media → this is a storage and bandwidth problem first. Media dwarfs everything → blob store + CDN, never the DB.

Step 3 — The two halves

Media plane — upload images/videos to an object store (S3), serve via a CDN; generate multiple resized variants (thumbnail, feed, full) on upload so clients fetch the right size.
Metadata + feed plane — posts, users, follows, likes in databases; the feed built by fan-out exactly like Twitter.

upload → app → object store (original + resized variants) → CDN
              → metadata DB (post: id, user, caption, media_urls, ts)
feed → fan-out (push to followers' feeds) → read precomputed feed → hydrate + CDN images

Step 4 — Upload pipeline

Client uploads the image (often directly to the blob store via a pre-signed URL, bypassing app servers).
An async worker generates resized/transcoded variants and stores them.
The post metadata (with media URLs) is written and fanned out to followers.

Doing resizing async keeps the upload fast; serving pre-sized variants keeps feeds light.

Step 5 — The feed (reuse Twitter)

Same hybrid fan-out as Twitter: push posts into followers’ precomputed feeds (Redis lists) for O(1) reads; for celebrities (huge follower counts), pull-and-merge at read time to avoid the write storm. The feed stores post ids; reads hydrate post metadata and fetch images from the CDN. (See the Twitter lessons for the fan-out detail — don’t re-derive it; say “same hybrid fan-out.”)

Step 6 — Storage and sharding

Media in the object store (durable, erasure-coded), served by CDN; the biggest cost.
Posts sharded by post id / user id; social graph sharded by user; feeds in Redis.
Likes/comments — counts kept as async approximate counters; comment lists paginated.

Trade-offs to raise

Resize on upload (storage for variants, fast reads) vs on the fly (compute per read). Pre-resize wins for a read-heavy feed.
Hybrid fan-out trade (as Twitter).
Eventual consistency of feeds and counts — fine.

The interview cue

“It’s the Twitter feed plus media: upload images to an object store via pre-signed URLs, async-generate resized variants, serve through a CDN; the home feed is the same hybrid fan-out (push for normal users, pull for celebrities) over post ids, hydrated with CDN image URLs. Metadata in sharded DBs, feeds in Redis.” Media plane (blob+CDN+variants) + reused fan-out is the answer; implementation next.