Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·8 min read

Building Twitter

Implement the hybrid fan-out via a queue, the merge-at-read timeline assembly, Snowflake tweet IDs, and timeline caching.


Posting a tweet (async fan-out)

The write returns fast; fan-out happens asynchronously off a queue:

def post_tweet(author_id, text, media):
    tweet_id = snowflake()                       # time-sortable unique id
    media_url = upload_to_blob(media) if media else None
    tweets.put(tweet_id, {"author": author_id, "text": text,
                          "media": media_url, "ts": now()})
    fanout_queue.publish({"tweet_id": tweet_id, "author_id": author_id})
    return tweet_id

The fan-out worker (hybrid)

Workers push the tweet into followers’ timelines — unless the author is a celebrity, in which case they skip the push (pulled at read time instead):

def fanout_worker(msg):
    author = msg["author_id"]
    if follower_count(author) > CELEBRITY_THRESHOLD:   # e.g. > 1M followers
        return                                          # do NOT fan out; pulled on read
    for follower in followers(author):                 # paginate the follower list
        timeline = f"timeline:{follower}"
        redis.lpush(timeline, msg["tweet_id"])
        redis.ltrim(timeline, 0, 800)                  # cap the precomputed list

The celebrity check is what prevents one tweet from triggering tens of millions of list writes.

Reading a timeline (merge precomputed + celebrities)

def home_timeline(user_id, k=50):
    base = redis.lrange(f"timeline:{user_id}", 0, k)   # O(1): pushed (normal) tweets
    celeb_ids = celebrities_followed(user_id)          # the few big accounts you follow
    celeb_tweets = [latest_tweets(c, k) for c in celeb_ids]   # pulled at read time
    merged = merge_by_time(base, *celeb_tweets)[:k]    # k-way merge by tweet_id (time)
    return hydrate(merged)                             # fetch text/media from cache/store

Snowflake ids are time-sortable, so the merge is just a heap-merge by id. Hydration turns ids into full tweets (batched cache reads).

Snowflake IDs

Tweet ids must be unique, roughly time-ordered (for sorting/merging), and generated without a central bottleneck:

64-bit id = [timestamp (41 bits)] [machine id (10)] [sequence (12)]

Each node stamps the time, its machine id, and a per-ms counter — unique and sortable, no coordination. (Reuse for any “unique time-ordered id” need.)

Storage and sharding

  • Tweets sharded by tweet id (or author); replicated; media on CDN.
  • Timelines are capped lists in Redis, sharded by user id.
  • Social graph (followers / following) in a sharded store; “followers(author)” must page efficiently for fan-out.
  • A user’s profile timeline (their own tweets) is a simple per-author list.

Handling the hard cases

  • Celebrity tweet → not fanned out; merged at read (caps write amplification).
  • New follow → backfill: merge the newly-followed user’s recent tweets into your timeline (or just pull until the next refresh).
  • Fan-out lag → it’s async, so a tweet may take a few seconds to appear — eventual consistency, acceptable.
  • Hot tweet (viral) → tweet content is cached; the CDN serves media; likes/retweets counted asynchronously (approximate counters).
  • Timeline rebuild (cache loss) → regenerate from followees’ recent tweets (fan-out-on-read fallback).

Scaling

  • Reads dominate → Redis timelines + tweet cache absorb them; replicate hot shards.
  • Writes/fan-out → queue + worker fleet; the hybrid caps amplification.
  • Counts (likes/views) → async approximate counters, not synchronous DB increments.

The takeaway

Concrete signals: async hybrid fan-out (push for normal, pull-and-merge for celebrities), Snowflake ids enabling a time-ordered merge at read, capped Redis timelines, and CDN media. The push/pull hybrid is the reusable feed pattern — Instagram, news feed, and TikTok are variations on it.