Building Reddit
Implement the hot-ranking score, async vote aggregation, cached per-community listings, and the materialized-path comment tree with lazy loading.
The hot-ranking function
Compute a post’s hotness from score and age; store it so listings just sort by it:
import math
def hotness(ups, downs, created_at):
score = ups - downs
order = math.log10(max(abs(score), 1)) # dampen large scores
sign = 1 if score > 0 else (-1 if score < 0 else 0)
seconds = created_at - EPOCH
return round(sign * order + seconds / 45000, 7) # age steadily lifts/sinks
Recompute on each (aggregated) vote and on a schedule for time decay; keep per-community listings in a sorted set (Redis ZSET) keyed by hotness for O(log n) inserts and O(1) top-k reads.
Async vote aggregation (avoid hot-row contention)
A synchronous UPDATE posts SET score = score+1 per vote melts under load (every vote
on a hot post contends on one row). Instead, record the vote and aggregate:
def cast_vote(user_id, target_id, value):
prev = votes.get(user_id, target_id) # idempotent: one vote per user
if prev == value: return
votes.put(user_id, target_id, value) # store/replace the vote
vote_queue.publish({"target": target_id, "delta": value - (prev or 0)})
def aggregator(): # batches deltas
for batch in vote_queue.consume_batched(window="1s"):
for target, delta in sum_deltas(batch).items():
new_score = store.incr_score(target, delta)
listing_cache.zadd(community_of(target), hotness_for(target), target)
Counts become eventually consistent (and are often fuzzed) — a deliberate trade for scale and anti-gaming.
Reading a listing
def listing(community, sort="hot", k=25, cursor=0):
if sort == "hot":
return listing_cache.zrevrange(community, cursor, cursor + k) # precomputed
if sort == "new":
return db.posts_by_time(community, k, cursor)
if sort == "top":
return db.posts_by_score(community, window, k, cursor)
The personalized front page merges the top of each subscribed community’s hot listing (k-way merge), cached per user briefly.
The comment tree (materialized path + lazy loading)
Store each comment with a path so the whole subtree sorts and loads efficiently:
comment: { id, post_id, parent_id, path: "0003/0007/0002", score, body }
# children of a comment share its path prefix; siblings sort by the last segment or score
Load the top comments and a couple of reply levels; defer the rest:
def load_thread(post_id, k=200):
top = db.comments(post_id, parent=None, order_by="score", limit=k)
for c in top:
c.replies = db.comments(post_id, parent=c.id, order_by="score", limit=10)
c.more = c.reply_count > 10 # "load more" stub for deep/long branches
return build_tree(top)
A giant thread (100k+ comments) is never loaded whole — “load more” fetches deeper branches on demand.
Storage and scale
- Posts/comments/votes sharded (posts by community or id; comments by post id so a thread is co-located).
- Listings in Redis ZSETs per community; front page merged from those.
- Comment counts/scores updated via the same async aggregation.
Failure handling
- Vote queue lag → counts trail slightly (acceptable; fuzzed anyway).
- Listing cache loss → rebuild from the DB (sort by stored hotness).
- Hot post (massive votes/comments) → async aggregation + cached listing absorb it; comment loads stay lazy.
- Vote gaming → rate-limit votes per user, fuzz counts, detect brigading.
The takeaway
Concrete signals: a precomputed hotness in Redis ZSETs for instant listings, idempotent async-aggregated voting to dodge hot-row contention (eventually consistent, fuzzed), and a materialized-path comment tree with lazy “load more.” Vote ranking + async counters + lazy trees are the forum-specific patterns atop the usual sharding and caching.