Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·7 min read

Building Reddit

Implement the hot-ranking score, async vote aggregation, cached per-community listings, and the materialized-path comment tree with lazy loading.


The hot-ranking function

Compute a post’s hotness from score and age; store it so listings just sort by it:

import math
def hotness(ups, downs, created_at):
    score = ups - downs
    order = math.log10(max(abs(score), 1))          # dampen large scores
    sign = 1 if score > 0 else (-1 if score < 0 else 0)
    seconds = created_at - EPOCH
    return round(sign * order + seconds / 45000, 7)  # age steadily lifts/sinks

Recompute on each (aggregated) vote and on a schedule for time decay; keep per-community listings in a sorted set (Redis ZSET) keyed by hotness for O(log n) inserts and O(1) top-k reads.

Async vote aggregation (avoid hot-row contention)

A synchronous UPDATE posts SET score = score+1 per vote melts under load (every vote on a hot post contends on one row). Instead, record the vote and aggregate:

def cast_vote(user_id, target_id, value):
    prev = votes.get(user_id, target_id)             # idempotent: one vote per user
    if prev == value: return
    votes.put(user_id, target_id, value)             # store/replace the vote
    vote_queue.publish({"target": target_id, "delta": value - (prev or 0)})

def aggregator():                                    # batches deltas
    for batch in vote_queue.consume_batched(window="1s"):
        for target, delta in sum_deltas(batch).items():
            new_score = store.incr_score(target, delta)
            listing_cache.zadd(community_of(target), hotness_for(target), target)

Counts become eventually consistent (and are often fuzzed) — a deliberate trade for scale and anti-gaming.

Reading a listing

def listing(community, sort="hot", k=25, cursor=0):
    if sort == "hot":
        return listing_cache.zrevrange(community, cursor, cursor + k)   # precomputed
    if sort == "new":
        return db.posts_by_time(community, k, cursor)
    if sort == "top":
        return db.posts_by_score(community, window, k, cursor)

The personalized front page merges the top of each subscribed community’s hot listing (k-way merge), cached per user briefly.

The comment tree (materialized path + lazy loading)

Store each comment with a path so the whole subtree sorts and loads efficiently:

comment: { id, post_id, parent_id, path: "0003/0007/0002", score, body }
# children of a comment share its path prefix; siblings sort by the last segment or score

Load the top comments and a couple of reply levels; defer the rest:

def load_thread(post_id, k=200):
    top = db.comments(post_id, parent=None, order_by="score", limit=k)
    for c in top:
        c.replies = db.comments(post_id, parent=c.id, order_by="score", limit=10)
        c.more = c.reply_count > 10                  # "load more" stub for deep/long branches
    return build_tree(top)

A giant thread (100k+ comments) is never loaded whole — “load more” fetches deeper branches on demand.

Storage and scale

  • Posts/comments/votes sharded (posts by community or id; comments by post id so a thread is co-located).
  • Listings in Redis ZSETs per community; front page merged from those.
  • Comment counts/scores updated via the same async aggregation.

Failure handling

  • Vote queue lag → counts trail slightly (acceptable; fuzzed anyway).
  • Listing cache loss → rebuild from the DB (sort by stored hotness).
  • Hot post (massive votes/comments) → async aggregation + cached listing absorb it; comment loads stay lazy.
  • Vote gaming → rate-limit votes per user, fuzz counts, detect brigading.

The takeaway

Concrete signals: a precomputed hotness in Redis ZSETs for instant listings, idempotent async-aggregated voting to dodge hot-row contention (eventually consistent, fuzzed), and a materialized-path comment tree with lazy “load more.” Vote ranking + async counters + lazy trees are the forum-specific patterns atop the usual sharding and caching.