Designing a collaborative editor (Google Docs)
Real-time collaborative editing — how multiple people edit one document at once without overwriting each other, via operational transformation or CRDTs.
The problem
Design Google Docs: many people edit the same document simultaneously and see each other’s changes live, with no edit lost and everyone converging to the same final text. The hard part isn’t storage — it’s concurrent conflict-free editing, which is why this is the canonical OT/CRDT problem.
Step 1 — Requirements
Functional: multiple users edit a doc concurrently; changes propagate in real time; everyone converges to the same document; show others’ cursors/ presence; persistence and version history; offline edits reconcile.
Non-functional: low latency (keystrokes feel instant), consistency (no lost edits, all replicas converge), availability, and scale (many docs, many concurrent editors per doc).
Step 2 — Why naive approaches fail
If two users edit the same position and you just send full-document saves or raw “insert at index 5” operations, edits clobber each other or apply at the wrong place (your index 5 isn’t mine after I inserted text earlier). You need a way to transform concurrent operations so they compose correctly regardless of order.
Step 3 — The two solutions
Operational Transformation (OT):
- Represent edits as operations (insert/delete at a position).
- A central server orders operations and transforms each incoming op against the ops it missed, so it applies at the correct adjusted position.
- Example: you insert “X” at 2 while I insert “Y” at 2 concurrently; the server transforms one so we both end up with a consistent result.
- Pro: compact, proven (Google Docs uses OT). Con: transformation functions are notoriously tricky to get right; usually needs a central server.
CRDTs (Conflict-free Replicated Data Types):
- Give every character a unique, ordered identifier (not a mutable index), so inserts/deletes commute — apply in any order and converge automatically.
- Pro: no central transformation, works peer-to-peer and offline-first. Con: metadata overhead (ids per character); tombstones for deletes.
Name both; pick OT (server-coordinated) or CRDT (decentralized/offline) per the requirements.
Step 4 — Architecture
clients ⇄ WebSocket ⇄ collaboration servers ⇄ (OT engine / CRDT merge)
│ persist ops
▼
document store (ops log + periodic snapshots)
- WebSocket connections carry ops both ways in real time.
- A collaboration server owns a given document (or a shard of docs), applies the OT/CRDT logic, broadcasts transformed ops to all editors, and persists them.
- Presence (cursors, who’s online) rides the same channel.
Step 5 — Persistence and history
Store the operation log (append-only) plus periodic snapshots so you can reconstruct any version quickly (replay snapshot + subsequent ops) — this gives version history and undo cheaply. Snapshots cap replay cost.
Step 6 — Scale and routing
- One server owns a doc at a time (so ops have a single ordering point) — route all editors of a doc to the same server (consistent hashing on doc id); on failure, another server takes over from the persisted log.
- Most docs have few concurrent editors, so this shards naturally across many docs.
Trade-offs to raise
- OT (central, compact, hard to implement) vs CRDT (decentralized, offline-friendly, heavier metadata).
- Single-owner-per-doc (simple ordering, failover needed) vs multi-master (complex).
- Op log vs snapshots — log for fidelity, snapshots for fast load (use both).
The interview cue
“Edits are operations reconciled with OT (server-coordinated) or CRDTs (if we want offline/P2P); clients connect over WebSockets to a collaboration server that owns the doc, transforms and broadcasts ops, and persists an **op log
- snapshots** for history. Route all editors of a doc to one server for a single ordering point.” Concurrent-edit convergence (OT/CRDT) is the entire crux.