Designing Reddit · Lyte Code

Communities, voting, and ranked listings — the hot/top ranking algorithms and deeply nested comment trees that define a forum at scale.

The problem

Design Reddit: users post to communities (subreddits), vote posts and comments up/down, and browse listings ranked by hot/top/new. The distinctive challenges are vote-based ranking and deeply nested comment trees — different from a follow-graph feed.

Step 1 — Requirements

Functional: create/join communities; submit posts (link/text/media) to a community; up/downvote posts and comments; view ranked listings (hot, top, new, controversial); nested comment threads; user karma.

Non-functional: read-heavy, low-latency listings and comment loads, scalable, available; vote counts can be eventually consistent and even slightly fuzzed.

Step 2 — Data model

Community: id, name, subscribers
Post:      id, community_id, author, title, body/url, score, created_at, hotness
Vote:      user_id, target_id, value(+1/-1)     # dedup: one vote per user per target
Comment:   id, post_id, parent_comment_id, author, body, score, created_at

The parent_comment_id makes comments a tree (a comment replies to a post or another comment).

Step 3 — Ranking algorithms (the core)

Listings are per-community (and a personalized front page = merge of subscribed communities). The ranking functions:

Hot — balances score against age so fresh, upvoted posts rise and decay over time. Reddit’s classic formula: log10(score) + sign(score) × age / 45000 — the log dampens huge scores (the 10,000th upvote matters less than the 10th), and the age term steadily pushes posts down. Precompute a hotness value per post on each vote and sort listings by it.
Top — highest score within a time window (today/week/all). Just sort by score.
New — by created_at.
Controversial — high engagement with balanced up/down votes.

Because hotness changes only on votes/time, precompute and store it, and keep per-community listings in a cache/sorted structure for O(1) reads.

Step 4 — Voting at scale

Idempotent votes — one vote per user per target; store the vote and adjust the target’s score (a user can change/undo). Dedup by (user, target).
Async score updates — at high vote volume, don’t synchronously update the DB row per vote (hot row contention). Buffer/aggregate votes (queue → batched score updates) and recompute hotness; counts are eventually consistent and often fuzzed (Reddit deliberately approximates to deter gaming).

Step 5 — Nested comments

The comment tree is the other hard part:

Storage — each comment has parent_comment_id; loading a thread means fetching the tree. Store a path / materialized path (e.g. 1/4/9) or adjacency list to reconstruct nesting and sort siblings by score.
Loading — load the top N comments and their top replies; “load more” lazily fetches deeper/longer branches (a huge thread has hundreds of thousands of comments — never load all at once).
Ranking within a thread — sort sibling comments by score (best), new, or controversial.

Step 6 — Architecture

post/comment/vote → app → DB (posts, comments, votes; sharded)
votes → queue → aggregate → update score + hotness → community listing cache
read listing → cache (precomputed hot/top per community) → hydrate
read thread → comment tree (paginated, lazy "load more")

Trade-offs to raise

Precompute hotness (fast reads, recompute on vote) vs sort on read (fresh, slow).
Synchronous vote counts (accurate, hot-row contention) vs async/approximate (scalable, fuzzy). Reddit chooses scalable + fuzzed.
Eager vs lazy comment loading — lazy “load more” is mandatory for big threads.

The interview cue

“Posts live in communities; listings sort by a precomputed hotness (log-score + time decay) cached per community; votes are idempotent, aggregated async, and counts are eventually-consistent/fuzzed to scale and resist gaming; nested comments use a parent-pointer/materialized-path tree, lazily loaded with ‘load more’.” Hot ranking + async voting + lazy comment trees is the answer; implementation next.