Designing a collaborative editor (Google Docs)

Real-time collaborative editing — how multiple people edit one document at once without overwriting each other, via operational transformation or CRDTs.

The problem

Design Google Docs: many people edit the same document simultaneously and see each other’s changes live, with no edit lost and everyone converging to the same final text. The hard part isn’t storage — it’s concurrent conflict-free editing, which is why this is the canonical OT/CRDT problem.

Step 1 — Requirements

Functional: multiple users edit a doc concurrently; changes propagate in real time; everyone converges to the same document; show others’ cursors/ presence; persistence and version history; offline edits reconcile.

Non-functional: low latency (keystrokes feel instant), consistency (no lost edits, all replicas converge), availability, and scale (many docs, many concurrent editors per doc).

Step 2 — Why naive approaches fail

If two users edit the same position and you just send full-document saves or raw “insert at index 5” operations, edits clobber each other or apply at the wrong place (your index 5 isn’t mine after I inserted text earlier). You need a way to transform concurrent operations so they compose correctly regardless of order.

Step 3 — The two solutions

Operational Transformation (OT):

Represent edits as operations (insert/delete at a position).
A central server orders operations and transforms each incoming op against the ops it missed, so it applies at the correct adjusted position.
Example: you insert “X” at 2 while I insert “Y” at 2 concurrently; the server transforms one so we both end up with a consistent result.
Pro: compact, proven (Google Docs uses OT). Con: transformation functions are notoriously tricky to get right; usually needs a central server.

CRDTs (Conflict-free Replicated Data Types):

Give every character a unique, ordered identifier (not a mutable index), so inserts/deletes commute — apply in any order and converge automatically.
Pro: no central transformation, works peer-to-peer and offline-first. Con: metadata overhead (ids per character); tombstones for deletes.

Name both; pick OT (server-coordinated) or CRDT (decentralized/offline) per the requirements.

Step 4 — Architecture

clients ⇄ WebSocket ⇄ collaboration servers ⇄ (OT engine / CRDT merge)
                            │ persist ops
                            ▼
                     document store (ops log + periodic snapshots)

WebSocket connections carry ops both ways in real time.
A collaboration server owns a given document (or a shard of docs), applies the OT/CRDT logic, broadcasts transformed ops to all editors, and persists them.
Presence (cursors, who’s online) rides the same channel.

Step 5 — Persistence and history

Store the operation log (append-only) plus periodic snapshots so you can reconstruct any version quickly (replay snapshot + subsequent ops) — this gives version history and undo cheaply. Snapshots cap replay cost.

Step 6 — Scale and routing

One server owns a doc at a time (so ops have a single ordering point) — route all editors of a doc to the same server (consistent hashing on doc id); on failure, another server takes over from the persisted log.
Most docs have few concurrent editors, so this shards naturally across many docs.

Trade-offs to raise

OT (central, compact, hard to implement) vs CRDT (decentralized, offline-friendly, heavier metadata).
Single-owner-per-doc (simple ordering, failover needed) vs multi-master (complex).
Op log vs snapshots — log for fidelity, snapshots for fast load (use both).

The interview cue

“Edits are operations reconciled with OT (server-coordinated) or CRDTs (if we want offline/P2P); clients connect over WebSockets to a collaboration server that owns the doc, transforms and broadcasts ops, and persists an **op log

snapshots** for history. Route all editors of a doc to one server for a single ordering point.” Concurrent-edit convergence (OT/CRDT) is the entire crux.