Building Pastebin
Implement the blob-plus-metadata write, signed-URL access for private pastes, immutable CDN caching, and storage-leak-free expiry.
The create path (blob first, then metadata)
Write the large content to the object store, then record small metadata pointing at it:
def create_paste(text, owner, ttl, visibility):
if len(text) > MAX_SIZE: raise TooLarge()
code = encode(idgen.get_id()) # base62, range-leased (see shortener)
key = f"pastes/{code}"
blob_store.put(key, text, content_type="text/plain") # S3-style
db.insert(paste_id=code, owner=owner, blob_key=key, size=len(text),
visibility=visibility, created_at=now(),
expires_at=now()+ttl if ttl else None)
return f"https://lyte.bin/{code}"
Order matters: write the blob first so a metadata row never points at a missing object. (A periodic sweeper deletes orphan blobs whose metadata write failed.)
The read path
def read_paste(code, requester):
meta = cache.get(code) or db.get(code) # metadata cache
if not meta or expired(meta): return 404
if meta.visibility == "private" and requester != meta.owner:
return 403
cache.set(code, meta, ttl="1h")
if meta.visibility == "public":
return redirect(cdn_url(meta.blob_key)) # CDN serves the blob
else:
return redirect(signed_url(meta.blob_key, ttl="5m")) # private → short-lived signed URL
Public vs private content delivery
- Public/unlisted → served straight from the CDN (immutable, long TTL, great hit ratio).
- Private → a pre-signed URL with a short expiry so the client fetches the blob directly from the store without it being publicly cacheable; auth is checked at the app before signing.
Immutability = trivial caching
Pastes never change, so there’s no invalidation problem — cache metadata and content as long as you like; TTLs exist only to reclaim memory/edge space. This is why content-addressed, immutable designs are so cache-friendly.
Expiry without leaking storage
The subtle bug: deleting expired metadata but leaving the blob wastes storage forever. Delete both:
def purge_expired():
for meta in db.find_expired(limit=10_000): # batched
blob_store.delete(meta.blob_key) # blob first
db.delete(meta.paste_id) # then metadata
Run it on a schedule (reuse the job scheduler); also lazily 404 expired pastes on read so they appear gone immediately.
Scaling and failure handling
- Reads scale via CDN + metadata cache; the DB only holds small rows.
- Metadata DB shards by
paste_id; blob store scales independently. - Blob store down → reads of cached/CDN’d content still work; new writes fail fast and retry.
- Hot paste (a viral snippet) → CDN absorbs it; nothing special needed.
The takeaway
Concrete signals: blob-first write with metadata pointing at it, CDN for public / signed URLs for private, immutable = cache-freely, and expiry that deletes both blob and metadata. It’s the URL shortener’s ID/caching plus the metadata/object-store split — the exact pattern under Dropbox, S3, and every media system next.