Skip to content
System design course
Ch.4 · Designing real systems·concept ·8 min read

Designing an online code editor

A browser IDE that runs untrusted code — the security-first problem of sandboxed execution, resource limits, and scaling a fleet of ephemeral runners.


The problem

Design an online code editor / runner (Replit, LeetCode’s judge, CodeSandbox): users write code in the browser and execute it server-side, seeing output. The defining challenge is running untrusted, arbitrary code safely at scale — this is fundamentally a security and isolation problem.

Step 1 — Requirements

Functional: edit code in the browser; run it (multiple languages) and stream back output/errors; support stdin; enforce time/memory limits; optionally persist projects and collaborate.

Non-functional: security/isolation (untrusted code must not harm the host or other users — the top priority), scalability (many concurrent runs), low latency to start a run, and fair resource allocation.

Step 2 — The core challenge: safe execution

Running arbitrary user code means assuming it’s malicious — it may try to read other users’ data, exhaust CPU/memory, fork bombs, make network calls, or escape to the host. Defenses, layered:

  • Containers / micro-VMs — run each execution in an isolated container (Docker, gVisor for a stronger boundary) or a lightweight micro-VM (Firecracker) — the latter gives VM-grade isolation with near-container startup. Strongest, common for this exact problem.
  • Resource limits (cgroups) — cap CPU, memory, processes (PIDs), and execution time (kill runaway code); disk quotas.
  • Drop privileges & syscalls — non-root, seccomp to whitelist syscalls, read-only filesystem, no host mounts.
  • Network isolation — no outbound network by default (or a strict allowlist).
  • Ephemeral — destroy the sandbox after each run so nothing persists between users.

Step 3 — Architecture

browser editor → API → job queue → execution workers (sandbox pool)
                                         │ run in container/micro-VM
                                         └─ stream stdout/stderr back (WebSocket)
projects ──▶ object store / DB (saved code)
  • The API validates and enqueues a run; execution workers pull jobs, spin up a sandbox, run with limits, and stream output back over a WebSocket.
  • A warm pool of pre-initialized sandboxes cuts cold-start latency.

Step 4 — Streaming output

Output streams live (a long-running program prints incrementally), so workers push stdout/stderr over a WebSocket/SSE to the client as it’s produced, and forward stdin the other way for interactive programs.

Step 5 — Persistence and collaboration

  • Projects (files) saved in object storage / a DB, loaded into the sandbox on start.
  • Collaboration (if needed) reuses the Google Docs OT/CRDT approach for the editor, separate from execution.

Step 6 — Scale and fairness

  • Horizontal worker fleet behind the queue; autoscale on queue depth.
  • Per-user quotas / rate limits so one user can’t hog the fleet (reuse the rate limiter); a job timeout frees stuck sandboxes.
  • Bin-packing runs onto hosts while respecting isolation.

Trade-offs to raise

  • Container (fast, lighter isolation) vs micro-VM/gVisor (stronger, slightly heavier) vs full VM (strongest, slow) — pick by threat model; micro-VMs are the sweet spot for untrusted code.
  • Warm pool (low latency, idle cost) vs cold start (cheaper, slower).
  • Persistent vs ephemeral sandboxes — ephemeral is safer; persistent is faster for iterative dev (with careful reset).

The interview cue

“Treat all user code as hostile: run each execution in an ephemeral micro-VM / sandboxed container with cgroup limits, seccomp, no network, and a hard timeout; a queue + autoscaled worker fleet with a warm pool for fast starts; stream stdout over WebSocket; per-user quotas for fairness.” Isolation and resource limits are the heart — say security first.