Designing Reddit
Communities, voting, and ranked listings — the hot/top ranking algorithms and deeply nested comment trees that define a forum at scale.
The problem
Design Reddit: users post to communities (subreddits), vote posts and comments up/down, and browse listings ranked by hot/top/new. The distinctive challenges are vote-based ranking and deeply nested comment trees — different from a follow-graph feed.
Step 1 — Requirements
Functional: create/join communities; submit posts (link/text/media) to a community; up/downvote posts and comments; view ranked listings (hot, top, new, controversial); nested comment threads; user karma.
Non-functional: read-heavy, low-latency listings and comment loads, scalable, available; vote counts can be eventually consistent and even slightly fuzzed.
Step 2 — Data model
Community: id, name, subscribers
Post: id, community_id, author, title, body/url, score, created_at, hotness
Vote: user_id, target_id, value(+1/-1) # dedup: one vote per user per target
Comment: id, post_id, parent_comment_id, author, body, score, created_at
The parent_comment_id makes comments a tree (a comment replies to a post or
another comment).
Step 3 — Ranking algorithms (the core)
Listings are per-community (and a personalized front page = merge of subscribed communities). The ranking functions:
- Hot — balances score against age so fresh, upvoted posts rise and decay
over time. Reddit’s classic formula:
log10(score) + sign(score) × age / 45000— the log dampens huge scores (the 10,000th upvote matters less than the 10th), and the age term steadily pushes posts down. Precompute a hotness value per post on each vote and sort listings by it. - Top — highest score within a time window (today/week/all). Just sort by score.
- New — by
created_at. - Controversial — high engagement with balanced up/down votes.
Because hotness changes only on votes/time, precompute and store it, and keep per-community listings in a cache/sorted structure for O(1) reads.
Step 4 — Voting at scale
- Idempotent votes — one vote per user per target; store the vote and adjust the
target’s score (a user can change/undo). Dedup by
(user, target). - Async score updates — at high vote volume, don’t synchronously update the DB row per vote (hot row contention). Buffer/aggregate votes (queue → batched score updates) and recompute hotness; counts are eventually consistent and often fuzzed (Reddit deliberately approximates to deter gaming).
Step 5 — Nested comments
The comment tree is the other hard part:
- Storage — each comment has
parent_comment_id; loading a thread means fetching the tree. Store a path / materialized path (e.g.1/4/9) or adjacency list to reconstruct nesting and sort siblings by score. - Loading — load the top N comments and their top replies; “load more” lazily fetches deeper/longer branches (a huge thread has hundreds of thousands of comments — never load all at once).
- Ranking within a thread — sort sibling comments by score (best), new, or controversial.
Step 6 — Architecture
post/comment/vote → app → DB (posts, comments, votes; sharded)
votes → queue → aggregate → update score + hotness → community listing cache
read listing → cache (precomputed hot/top per community) → hydrate
read thread → comment tree (paginated, lazy "load more")
Trade-offs to raise
- Precompute hotness (fast reads, recompute on vote) vs sort on read (fresh, slow).
- Synchronous vote counts (accurate, hot-row contention) vs async/approximate (scalable, fuzzy). Reddit chooses scalable + fuzzed.
- Eager vs lazy comment loading — lazy “load more” is mandatory for big threads.
The interview cue
“Posts live in communities; listings sort by a precomputed hotness (log-score + time decay) cached per community; votes are idempotent, aggregated async, and counts are eventually-consistent/fuzzed to scale and resist gaming; nested comments use a parent-pointer/materialized-path tree, lazily loaded with ‘load more’.” Hot ranking + async voting + lazy comment trees is the answer; implementation next.