Designing Instagram
Photo sharing at scale — media upload to a blob store and CDN, a fan-out feed like Twitter's, and the storage split between metadata and images.
The problem
Design Instagram: users upload photos/videos, follow others, and see a feed of recent posts from people they follow, plus profiles and likes/comments. It combines the Twitter feed (fan-out) with large media handling (the new ingredient here).
Step 1 — Requirements
Functional: upload a photo (with caption); follow users; view a home feed of followees’ posts; view profiles; like/comment. (Stories/explore optional.)
Non-functional: read-heavy, low-latency feed and image load, durable media storage, scale (billions of photos), eventual consistency is fine.
Step 2 — Estimate
- Say 500M photos/day → ~6k writes/sec. Feed/image reads are far higher.
- Average photo ~2 MB → 500M × 2 MB ≈ 1 PB/day of media → this is a storage and bandwidth problem first. Media dwarfs everything → blob store + CDN, never the DB.
Step 3 — The two halves
- Media plane — upload images/videos to an object store (S3), serve via a CDN; generate multiple resized variants (thumbnail, feed, full) on upload so clients fetch the right size.
- Metadata + feed plane — posts, users, follows, likes in databases; the feed built by fan-out exactly like Twitter.
upload → app → object store (original + resized variants) → CDN
→ metadata DB (post: id, user, caption, media_urls, ts)
feed → fan-out (push to followers' feeds) → read precomputed feed → hydrate + CDN images
Step 4 — Upload pipeline
- Client uploads the image (often directly to the blob store via a pre-signed URL, bypassing app servers).
- An async worker generates resized/transcoded variants and stores them.
- The post metadata (with media URLs) is written and fanned out to followers.
Doing resizing async keeps the upload fast; serving pre-sized variants keeps feeds light.
Step 5 — The feed (reuse Twitter)
Same hybrid fan-out as Twitter: push posts into followers’ precomputed feeds (Redis lists) for O(1) reads; for celebrities (huge follower counts), pull-and-merge at read time to avoid the write storm. The feed stores post ids; reads hydrate post metadata and fetch images from the CDN. (See the Twitter lessons for the fan-out detail — don’t re-derive it; say “same hybrid fan-out.”)
Step 6 — Storage and sharding
- Media in the object store (durable, erasure-coded), served by CDN; the biggest cost.
- Posts sharded by post id / user id; social graph sharded by user; feeds in Redis.
- Likes/comments — counts kept as async approximate counters; comment lists paginated.
Trade-offs to raise
- Resize on upload (storage for variants, fast reads) vs on the fly (compute per read). Pre-resize wins for a read-heavy feed.
- Hybrid fan-out trade (as Twitter).
- Eventual consistency of feeds and counts — fine.
The interview cue
“It’s the Twitter feed plus media: upload images to an object store via pre-signed URLs, async-generate resized variants, serve through a CDN; the home feed is the same hybrid fan-out (push for normal users, pull for celebrities) over post ids, hydrated with CDN image URLs. Metadata in sharded DBs, feeds in Redis.” Media plane (blob+CDN+variants) + reused fan-out is the answer; implementation next.