Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·6 min read

Building Pastebin

Implement the blob-plus-metadata write, signed-URL access for private pastes, immutable CDN caching, and storage-leak-free expiry.


The create path (blob first, then metadata)

Write the large content to the object store, then record small metadata pointing at it:

def create_paste(text, owner, ttl, visibility):
    if len(text) > MAX_SIZE: raise TooLarge()
    code = encode(idgen.get_id())                 # base62, range-leased (see shortener)
    key = f"pastes/{code}"
    blob_store.put(key, text, content_type="text/plain")   # S3-style
    db.insert(paste_id=code, owner=owner, blob_key=key, size=len(text),
              visibility=visibility, created_at=now(),
              expires_at=now()+ttl if ttl else None)
    return f"https://lyte.bin/{code}"

Order matters: write the blob first so a metadata row never points at a missing object. (A periodic sweeper deletes orphan blobs whose metadata write failed.)

The read path

def read_paste(code, requester):
    meta = cache.get(code) or db.get(code)        # metadata cache
    if not meta or expired(meta): return 404
    if meta.visibility == "private" and requester != meta.owner:
        return 403
    cache.set(code, meta, ttl="1h")
    if meta.visibility == "public":
        return redirect(cdn_url(meta.blob_key))   # CDN serves the blob
    else:
        return redirect(signed_url(meta.blob_key, ttl="5m"))  # private → short-lived signed URL

Public vs private content delivery

  • Public/unlisted → served straight from the CDN (immutable, long TTL, great hit ratio).
  • Private → a pre-signed URL with a short expiry so the client fetches the blob directly from the store without it being publicly cacheable; auth is checked at the app before signing.

Immutability = trivial caching

Pastes never change, so there’s no invalidation problem — cache metadata and content as long as you like; TTLs exist only to reclaim memory/edge space. This is why content-addressed, immutable designs are so cache-friendly.

Expiry without leaking storage

The subtle bug: deleting expired metadata but leaving the blob wastes storage forever. Delete both:

def purge_expired():
    for meta in db.find_expired(limit=10_000):    # batched
        blob_store.delete(meta.blob_key)          # blob first
        db.delete(meta.paste_id)                  # then metadata

Run it on a schedule (reuse the job scheduler); also lazily 404 expired pastes on read so they appear gone immediately.

Scaling and failure handling

  • Reads scale via CDN + metadata cache; the DB only holds small rows.
  • Metadata DB shards by paste_id; blob store scales independently.
  • Blob store down → reads of cached/CDN’d content still work; new writes fail fast and retry.
  • Hot paste (a viral snippet) → CDN absorbs it; nothing special needed.

The takeaway

Concrete signals: blob-first write with metadata pointing at it, CDN for public / signed URLs for private, immutable = cache-freely, and expiry that deletes both blob and metadata. It’s the URL shortener’s ID/caching plus the metadata/object-store split — the exact pattern under Dropbox, S3, and every media system next.