Designing Pastebin · Lyte Code

Like a URL shortener but storing large text blobs — the moment to separate small metadata in a database from large content in a blob store behind a CDN.

The problem

Design Pastebin: users paste text (often large), get a short URL, and anyone with the link can read the paste. It’s the URL shortener plus a large-content twist — the canonical place to learn metadata-in-DB, blob-in-object-store.

Step 1 — Requirements

Functional: create a paste (text up to a few MB), get a short URL, read it via the URL; optional expiry, visibility (public/unlisted/private), syntax/size limits.

Non-functional: read-heavy (reads ≫ writes), low-latency reads, durable storage of potentially large blobs, scalable, available.

Step 2 — Estimate

Say 10M pastes/day → ~115 writes/sec. Reads maybe 10–100× → thousands/sec.
Average paste ~10 KB, max a few MB → 10M × 10 KB = ~100 GB/day of content → don’t put blobs in the relational DB.

That storage number is the whole lesson: large content goes in a blob/object store, not the database.

Step 3 — The key split: metadata vs content

Metadata (small, queried) → a database: paste_id, owner, created_at, expires_at, visibility, content_url, size.
Content (large, blob) → an object store (S3-style), one object per paste, fronted by a CDN for hot reads.

create: client → app → store blob in S3 → store metadata row (with blob URL) → return code
read:   client → app → metadata (cache) → fetch blob from CDN/S3

The DB stays small and fast; the object store handles the bulk; the CDN serves popular pastes from the edge.

Step 4 — Code generation

Same as the URL shortener: base62 of a range-leased ID or a key-generation service for unique short codes (reuse that lesson). No need to re-derive it — say “same ID strategy as the shortener.”

Step 5 — Reads (cache + CDN)

Metadata in a cache (Redis) for the hot pastes.
Content via CDN; pastes are immutable, so caching is trivial (no invalidation), with TTLs for memory.
Private pastes bypass the CDN (or use signed URLs) and check auth at the app.

Step 6 — Expiry and limits

Expiry → expires_at; a background job deletes expired metadata and their blobs (don’t leak storage); lazy-check on read.
Size/abuse limits → cap paste size; rate-limit creates; scan for malicious content.

Trade-offs to raise

Blob store + CDN vs DB BLOBs — never store multi-MB blobs in the relational DB (bloats it, kills cache efficiency); the object-store split is the right call.
Immutability simplifies caching — pastes don’t change, so cache freely.
Public (CDN-cacheable) vs private (auth’d, signed URLs) content paths.

The interview cue

“Same short-code generation as the URL shortener, but content is large — so paste text goes to an object store fronted by a CDN, and only small metadata lives in the DB (cached for hot pastes). Pastes are immutable, so caching is trivial; expiry purges both metadata and blobs.” The metadata/blob split + CDN is the new idea here and recurs in every media system ahead.