Designing an online code editor

A browser IDE that runs untrusted code — the security-first problem of sandboxed execution, resource limits, and scaling a fleet of ephemeral runners.

The problem

Design an online code editor / runner (Replit, LeetCode’s judge, CodeSandbox): users write code in the browser and execute it server-side, seeing output. The defining challenge is running untrusted, arbitrary code safely at scale — this is fundamentally a security and isolation problem.

Step 1 — Requirements

Functional: edit code in the browser; run it (multiple languages) and stream back output/errors; support stdin; enforce time/memory limits; optionally persist projects and collaborate.

Non-functional: security/isolation (untrusted code must not harm the host or other users — the top priority), scalability (many concurrent runs), low latency to start a run, and fair resource allocation.

Step 2 — The core challenge: safe execution

Running arbitrary user code means assuming it’s malicious — it may try to read other users’ data, exhaust CPU/memory, fork bombs, make network calls, or escape to the host. Defenses, layered:

Containers / micro-VMs — run each execution in an isolated container (Docker, gVisor for a stronger boundary) or a lightweight micro-VM (Firecracker) — the latter gives VM-grade isolation with near-container startup. Strongest, common for this exact problem.
Resource limits (cgroups) — cap CPU, memory, processes (PIDs), and execution time (kill runaway code); disk quotas.
Drop privileges & syscalls — non-root, seccomp to whitelist syscalls, read-only filesystem, no host mounts.
Network isolation — no outbound network by default (or a strict allowlist).
Ephemeral — destroy the sandbox after each run so nothing persists between users.

Step 3 — Architecture

browser editor → API → job queue → execution workers (sandbox pool)
                                         │ run in container/micro-VM
                                         └─ stream stdout/stderr back (WebSocket)
projects ──▶ object store / DB (saved code)

The API validates and enqueues a run; execution workers pull jobs, spin up a sandbox, run with limits, and stream output back over a WebSocket.
A warm pool of pre-initialized sandboxes cuts cold-start latency.

Step 4 — Streaming output

Output streams live (a long-running program prints incrementally), so workers push stdout/stderr over a WebSocket/SSE to the client as it’s produced, and forward stdin the other way for interactive programs.

Step 5 — Persistence and collaboration

Projects (files) saved in object storage / a DB, loaded into the sandbox on start.
Collaboration (if needed) reuses the Google Docs OT/CRDT approach for the editor, separate from execution.

Step 6 — Scale and fairness

Horizontal worker fleet behind the queue; autoscale on queue depth.
Per-user quotas / rate limits so one user can’t hog the fleet (reuse the rate limiter); a job timeout frees stuck sandboxes.
Bin-packing runs onto hosts while respecting isolation.

Trade-offs to raise

Container (fast, lighter isolation) vs micro-VM/gVisor (stronger, slightly heavier) vs full VM (strongest, slow) — pick by threat model; micro-VMs are the sweet spot for untrusted code.
Warm pool (low latency, idle cost) vs cold start (cheaper, slower).
Persistent vs ephemeral sandboxes — ephemeral is safer; persistent is faster for iterative dev (with careful reset).

The interview cue

“Treat all user code as hostile: run each execution in an ephemeral micro-VM / sandboxed container with cgroup limits, seccomp, no network, and a hard timeout; a queue + autoscaled worker fleet with a warm pool for fast starts; stream stdout over WebSocket; per-user quotas for fairness.” Isolation and resource limits are the heart — say security first.