Designing a distributed message queue
Build a Kafka-style durable log — partitioned topics, the publish/subscribe model, consumer groups, and the delivery guarantees that make async systems reliable.
The problem
Design a distributed message queue / streaming log (think Kafka): a durable pipe that decouples producers from consumers. Producers append messages; consumers read them at their own pace. It absorbs spikes, enables async processing, and is the backbone of nearly every large system in this chapter (notifications, feeds, metrics, order pipelines).
Step 1 — Requirements
Functional: producers publish to a topic; consumers subscribe and read; messages are durable (survive crashes); support many producers/consumers and high throughput; preserve ordering (at least within a partition); let consumers replay from a past position.
Non-functional: high throughput (millions of msgs/sec), durability (no loss once acked), scalability, availability, and low end-to-end latency.
Step 2 — The core idea: a partitioned, append-only log
A topic is split into partitions, each an append-only, ordered log on disk. Messages get a monotonically increasing offset within a partition.
- Append-only sequential writes are why it’s so fast (disk loves sequential I/O); reads are sequential scans from an offset.
- Partitioning gives horizontal scale and parallelism — partitions spread across broker nodes, and consumers read partitions in parallel.
- Ordering is guaranteed within a partition, not across. Pick a partition key (e.g. user id) so messages that must stay ordered share a partition.
Step 3 — Pub/sub and consumer groups
- Producers send to a topic; the message lands in a partition (by key hash or round robin).
- Consumers join a consumer group; each partition is read by exactly one consumer in the group, so adding consumers scales throughput up to the partition count.
- Multiple groups each get the full stream independently (one for analytics, one for the email service) — that’s the pub/sub fan-out.
Step 4 — Durability and replication
Each partition is replicated (leader + followers) across brokers. Producers write to the leader; followers replicate. A message is committed once enough replicas (the in-sync set) have it. On leader failure, an in-sync follower is promoted — no data loss for committed messages.
Step 5 — Delivery guarantees
The classic spectrum (state the trade):
- At-most-once — consumer commits its offset before processing; a crash loses the message. Fast, lossy.
- At-least-once — commit after processing; a crash reprocesses → duplicates. The common default; consumers must be idempotent.
- Exactly-once — at-least-once + dedup/transactions (idempotent producer + transactional writes). Strongest, most expensive.
Step 6 — Consumers track their own offset
Crucially, the broker doesn’t track per-consumer state — each consumer group stores its committed offset (where it’s read up to). This makes brokers simple and lets consumers replay (rewind the offset) or skip. Messages are retained by time or size (not deleted on read), enabling multiple consumers and replay.
Trade-offs to raise
- Ordering vs parallelism — more partitions = more throughput but ordering only within a partition; choose the key carefully.
- At-least-once + idempotency vs the cost of exactly-once.
- Retention — longer retention enables replay but costs storage.
- Push vs pull — Kafka consumers pull (better backpressure and batching) vs brokers pushing.
The interview cue
“A topic is a set of partitioned append-only logs; producers key-partition for ordering, consumer groups parallelize reads (one consumer per partition), partitions are replicated with leader/follower for durability, consumers track their own offset for replay, and I’ll use at-least-once with idempotent consumers.” Partitioned log + consumer groups + offsets is the whole mental model; the write/read internals are next.