Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·7 min read

Building an online code editor

Implement the sandboxed run with cgroup/seccomp limits, a warm sandbox pool, live output streaming, and the timeout/cleanup that contains runaway code.


The execution worker

A worker takes a run job, grabs a sandbox, copies in the code, executes with hard limits, streams output, and always tears the sandbox down:

def run_job(job):
    sandbox = pool.acquire()                 # pre-warmed micro-VM/container
    try:
        sandbox.write_files(job.files)
        proc = sandbox.exec(
            cmd=lang_run_cmd(job.language),
            limits=Limits(cpu="1", mem="256m", pids=64, wall_time="10s"),
            network="none", read_only_root=True, user="nobody", seccomp=PROFILE,
        )
        stream_output(job.id, proc)          # push stdout/stderr live
        return proc.wait(timeout=10)          # SIGKILL on timeout
    finally:
        pool.destroy(sandbox)                 # ephemeral: never reuse across users

The finally is the safety net — a sandbox is never reused by another user, so no state leaks.

Enforcing limits (defense in depth)

cgroups:   cpu quota, memory.max (OOM-kill on breach), pids.max (block fork bombs)
seccomp:   whitelist syscalls; block ptrace, mount, network syscalls
namespaces: isolated PID/network/mount/user namespaces
filesystem: read-only root + a small writable tmpfs scratch; no host mounts
wall clock: a hard kill timer outside the sandbox (don't trust in-sandbox timing)

Memory breach → the kernel OOM-kills the process; CPU overrun → throttled then timed out; fork bomb → blocked by the PID cap. All enforced by the host kernel, not the user’s code.

Warm pool for fast starts

Cold-booting a sandbox per run adds latency. Keep a pool of pre-initialized sandboxes (base image + language runtime loaded) and hand one out instantly; replace each consumed sandbox in the background:

class SandboxPool:
    def acquire(self):
        sb = self.ready.get()                 # instant if warm
        threading.Thread(target=self._refill).start()
        return sb

Streaming output live

Programs print incrementally and may read stdin, so don’t buffer to completion — forward in real time over a WebSocket:

def stream_output(job_id, proc):
    for chunk in proc.stdout_stream():
        ws.send(job_id, {"type": "stdout", "data": chunk})
    # stdin flows the other way for interactive runs

The queue and autoscaling

The API enqueues runs; workers pull. Autoscale the worker fleet on queue depth and CPU, and enforce per-user concurrency limits (rate limiter) so one user can’t monopolize capacity. A global cap protects the cluster.

Failure handling and abuse

  • Runaway code → cgroup limits + wall-clock kill contain it; the sandbox is destroyed regardless.
  • Sandbox escape attempt → micro-VM/gVisor boundary + seccomp; defense in depth so one layer failing isn’t catastrophic.
  • Crash mid-run → the job is at-least-once; reruns are safe (ephemeral, no side effects) — or surface a clear failure.
  • Resource exhaustion attack (many heavy runs) → per-user quotas + global cap + queue backpressure.

The takeaway

Concrete signals: ephemeral sandboxes destroyed after each run, kernel-enforced cgroup/seccomp/namespace limits + wall-clock kill, a warm pool for latency, live streaming, and per-user quotas. The reusable lesson: never trust the workload — isolate it, cap it, and throw the box away. (This is exactly what an in-browser judge like Lyte Code’s own runner does, scaled to a server fleet.)