Designing a load balancer
Build the distributor itself as a system — health-checked backend pools, an algorithm, and the redundancy that keeps the balancer from being the single point of failure.
The problem
Chapter 2 used a load balancer; now design one as a system. A load balancer accepts client connections and forwards them to one of many backend servers, tracking health and spreading load — while not becoming the bottleneck or single point of failure itself.
Step 1 — Requirements
Functional: distribute incoming requests across a pool; health-check backends and remove unhealthy ones; support add/remove of backends at runtime; optionally sticky sessions and TLS termination.
Non-functional: very high throughput and low added latency (it’s on every request); highly available (its failure = total outage); horizontally scalable.
Step 2 — L4 vs L7
- L4 (transport) — forwards by IP/port, doesn’t parse the request. Extremely fast, protocol-agnostic; can’t route by URL or do HTTP features.
- L7 (application) — parses HTTP; routes by path/header/cookie, terminates TLS, does content-based routing. More work per request, far more capable.
Many stacks use both: L4 at the very edge for raw throughput, L7 behind it for smart routing.
Step 3 — Health checks
The balancer continuously probes backends:
- Active — periodic
GET /health; mark down after K consecutive failures, back up after M successes (hysteresis avoids flapping). - Passive — observe live traffic; eject a backend that returns errors/timeouts.
A “down” backend is pulled from rotation; this is what turns a crash into a non-event. The health endpoint should reflect real readiness (DB reachable, etc.).
Step 4 — The algorithm
Pick per workload (Chapter 2): round robin (uniform), least connections (variable request cost), weighted (mixed server sizes), hashing / consistent hashing (cache affinity or stickiness). Default: round robin or least connections.
Step 5 — High availability of the balancer itself
The crux: don’t let the LB be the SPOF. Standard approaches:
- Active–passive pair with a floating/virtual IP — a heartbeat between two LBs; if the active dies, the passive takes over the VIP (VRRP/keepalived). Failover in seconds.
- Active–active — multiple LBs all live, traffic spread across them by DNS round robin or an upstream anycast address.
- DNS / anycast at the very front so even a whole LB cluster has a backup.
High-level design
┌─ heartbeat ─┐
clients → [LB-active] [LB-passive] (shared virtual IP; failover on death)
│
round-robin / least-conn over healthy backends
▼
[s1] [s2] [s3] ... (health-checked pool)
Trade-offs to raise
- L4 speed vs L7 features — choose by whether you need content-aware routing.
- Stateful vs stateless balancing — sticky sessions help in-memory backends but hurt failover and elasticity; prefer stateless backends + shared session store.
- Failover time vs cost — active–active wastes no capacity and fails over instantly but is more complex than active–passive.
The interview cue
“L4 at the edge for throughput, L7 behind it for routing and TLS; health checks with hysteresis to eject bad backends; least connections for uneven request cost; and crucially an active–passive pair on a virtual IP (or active–active via anycast) so the balancer itself isn’t a single point of failure.” Designing the LB’s own availability is the part that separates this from just “add a load balancer.” Implementation — connection tracking and consistent hashing — is next.