Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·6 min read

Building Shopify (multi-tenant storefronts)

Implement tenant-scoped data access, domain→pod routing, per-tenant rate limiting, and theme rendering.


Framework-enforced tenant scoping

The dangerous bug in shared-schema multi-tenancy is a query that forgets the tenant filter, leaking another store’s data. Make scoping automatic, not per-query discipline:

# every request resolves a tenant; data access auto-injects tenant_id
class TenantContext:
    def query(self, model, **filters):
        return db.query(model).filter(tenant_id=self.tenant_id, **filters)  # always scoped

# a missing tenant_id is impossible — the data layer adds it
products = ctx.query(Product, category="shoes")    # implicitly WHERE tenant_id = ...

Better still, push it into the ORM/middleware (or DB row-level security) so no code path can bypass it. This is the isolation guarantee for shared-schema tenants.

Domain → tenant → pod routing

def route(request):
    tenant = tenant_directory.get(request.host)      # cached: domain → {tenant_id, pod}
    if not tenant: return 404
    pod = pods[tenant.pod]
    return pod.handle(request, TenantContext(tenant.tenant_id))

The tenant directory (domain → tenant → pod) is hot and cached at the edge. Moving a merchant to another pod (e.g. they grew) updates the directory; routing follows.

The pod model

Each pod is a self-contained stack (services + DB) hosting a subset of tenants. Onboarding a store assigns it to a pod by capacity; huge merchants get dedicated pods:

def assign_pod(tenant):
    if tenant.expected_volume > DEDICATED_THRESHOLD:
        return provision_dedicated_pod(tenant)
    return least_loaded_shared_pod()                 # balance density

A pod failure affects only its tenants (bounded blast radius), and pods scale independently.

Per-tenant rate limiting (noisy-neighbor)

One store’s traffic spike must not starve others — apply per-tenant quotas (reuse the rate limiter, keyed by tenant):

def guard(ctx, request):
    if not rate_limiter.allow(f"tenant:{ctx.tenant_id}", request.route):
        return 429                                   # this tenant's burst, not others'

Combined with pod isolation, a flash sale is contained.

Theme rendering

Storefronts share code; themes are data (templates + assets per tenant) rendered per request:

def render_storefront(ctx, page):
    theme = theme_store.get(ctx.tenant_id)           # tenant's template + settings
    data = ctx.query_page_data(page)                 # tenant-scoped products/collections
    return template_engine.render(theme[page], data) # shared engine, tenant template+data

Static theme assets are CDN’d per tenant; an app platform (webhooks + scoped APIs) lets third parties extend stores without touching core code.

Scale and failure handling

  • Uneven tenants → pods balance small stores; big ones get dedicated capacity.
  • Noisy neighbor → per-tenant rate limits + pod isolation cap the blast radius.
  • Pod failure → only its tenants affected; replicate within the pod for HA.
  • Data leak risk → framework/RLS-enforced scoping makes cross-tenant access impossible.
  • Flash sale → the heavy tenant’s pod absorbs it (or is pre-scaled); others unaffected.

The takeaway

Concrete signals: framework-enforced tenant scoping (no per-query discipline), domain → tenant → pod routing via a cached directory, the pod model for balanced density/isolation, per-tenant rate limiting for noisy neighbors, and themes-as-data on shared code. Multi-tenancy (isolation + routing + blast-radius control) is the reusable SaaS-platform pattern.