Building a collaborative editor (Google Docs)
Implement operational transformation against a server-ordered op log, the WebSocket sync loop with versioning, snapshots, and presence.
Modeling edits as operations
Every edit is an operation with a position and a base version (the doc revision it was made against):
# op: {"type": "insert"|"delete", "pos": int, "text": str|None, "base_version": int}
def apply(doc, op):
if op.type == "insert": return doc[:op.pos] + op.text + doc[op.pos:]
else: return doc[:op.pos] + doc[op.pos + op.len:]
Operational transformation (the core)
When an op arrives based on an older version, transform it against every op applied since, so its position is adjusted:
def transform(incoming, applied):
# adjust `incoming` to account for `applied` having happened first
if incoming.type == "insert" and applied.type == "insert":
if applied.pos <= incoming.pos:
incoming.pos += len(applied.text) # shift right past the earlier insert
elif applied.type == "delete" and applied.pos < incoming.pos:
incoming.pos -= applied.len # shift left for the earlier delete
# ... (delete-vs-insert, delete-vs-delete, tie-breaking by site id)
return incoming
The server is the single ordering authority: it assigns each op a sequential version, transforms incoming ops against the gap, applies, and broadcasts.
def on_op(doc_id, op):
doc = sessions[doc_id]
for past in doc.ops[op.base_version:]: # ops the client hadn't seen
op = transform(op, past)
op.version = len(doc.ops)
doc.text = apply(doc.text, op)
doc.ops.append(op)
persist(doc_id, op) # append to op log
broadcast(doc_id, op, exclude=op.author) # send transformed op to others
The client sync loop
Clients apply local edits immediately (optimistic) and reconcile with the server’s authoritative stream, transforming their pending ops against incoming ones:
# on local edit: apply locally, send to server, keep in `pending`
# on server op: transform it against `pending`, apply; on ack: drop from `pending`
This is what makes typing feel instant while still converging — local echo now, server truth reconciled continuously.
Transport and presence
A WebSocket per editor carries ops both ways. Presence (cursor positions, who’s online) is broadcast on the same channel as lightweight ephemeral messages (not persisted). Cursor positions are also transformed as text shifts, so your collaborators’ cursors stay in the right place.
Persistence: op log + snapshots
# every N ops or T seconds, snapshot the materialized doc
def snapshot(doc_id, doc):
store.put_snapshot(doc_id, version=len(doc.ops), text=doc.text)
# load = latest snapshot + replay ops after it
def load(doc_id):
snap = store.latest_snapshot(doc_id)
return replay(snap.text, store.ops_after(doc_id, snap.version))
Snapshots cap how many ops you replay on load; the op log gives full history and undo.
Routing and failover
Route all editors of a doc to one collaboration server (consistent hashing on doc id) so there’s a single op-ordering point. On server failure, another node takes ownership and rebuilds state from the persisted op log + latest snapshot; clients reconnect (WebSocket) and resync from their last known version.
Scaling and failure handling
- Many docs shard across servers; each doc’s concurrency is usually small.
- A hot doc (hundreds of editors) stays on one server but can fan broadcast via a pub/sub layer to edge connection nodes.
- Client disconnect/offline → queue local ops, resync on reconnect (transform against everything missed). CRDTs make this cleaner (commutative ops, no central transform) at the cost of per-char metadata.
- Network reorder → version numbers + transform guarantee convergence regardless of arrival order.
The takeaway
Concrete signals: edits as versioned operations, server-ordered OT that transforms concurrent ops to converge, optimistic local apply for instant typing, op log + snapshots for history and fast load, doc-to-server affinity for a single ordering point, and CRDTs as the offline/P2P alternative. Convergence under concurrency is the whole game.