Designing a load balancer

Build the distributor itself as a system — health-checked backend pools, an algorithm, and the redundancy that keeps the balancer from being the single point of failure.

The problem

Chapter 2 used a load balancer; now design one as a system. A load balancer accepts client connections and forwards them to one of many backend servers, tracking health and spreading load — while not becoming the bottleneck or single point of failure itself.

Step 1 — Requirements

Functional: distribute incoming requests across a pool; health-check backends and remove unhealthy ones; support add/remove of backends at runtime; optionally sticky sessions and TLS termination.

Non-functional: very high throughput and low added latency (it’s on every request); highly available (its failure = total outage); horizontally scalable.

Step 2 — L4 vs L7

L4 (transport) — forwards by IP/port, doesn’t parse the request. Extremely fast, protocol-agnostic; can’t route by URL or do HTTP features.
L7 (application) — parses HTTP; routes by path/header/cookie, terminates TLS, does content-based routing. More work per request, far more capable.

Many stacks use both: L4 at the very edge for raw throughput, L7 behind it for smart routing.

Step 3 — Health checks

The balancer continuously probes backends:

Active — periodic GET /health; mark down after K consecutive failures, back up after M successes (hysteresis avoids flapping).
Passive — observe live traffic; eject a backend that returns errors/timeouts.

A “down” backend is pulled from rotation; this is what turns a crash into a non-event. The health endpoint should reflect real readiness (DB reachable, etc.).

Step 4 — The algorithm

Pick per workload (Chapter 2): round robin (uniform), least connections (variable request cost), weighted (mixed server sizes), hashing / consistent hashing (cache affinity or stickiness). Default: round robin or least connections.

Step 5 — High availability of the balancer itself

The crux: don’t let the LB be the SPOF. Standard approaches:

Active–passive pair with a floating/virtual IP — a heartbeat between two LBs; if the active dies, the passive takes over the VIP (VRRP/keepalived). Failover in seconds.
Active–active — multiple LBs all live, traffic spread across them by DNS round robin or an upstream anycast address.
DNS / anycast at the very front so even a whole LB cluster has a backup.

High-level design

            ┌─ heartbeat ─┐
 clients → [LB-active]  [LB-passive]   (shared virtual IP; failover on death)
              │
   round-robin / least-conn over healthy backends
              ▼
        [s1] [s2] [s3] ... (health-checked pool)

Trade-offs to raise

L4 speed vs L7 features — choose by whether you need content-aware routing.
Stateful vs stateless balancing — sticky sessions help in-memory backends but hurt failover and elasticity; prefer stateless backends + shared session store.
Failover time vs cost — active–active wastes no capacity and fails over instantly but is more complex than active–passive.

The interview cue

“L4 at the edge for throughput, L7 behind it for routing and TLS; health checks with hysteresis to eject bad backends; least connections for uneven request cost; and crucially an active–passive pair on a virtual IP (or active–active via anycast) so the balancer itself isn’t a single point of failure.” Designing the LB’s own availability is the part that separates this from just “add a load balancer.” Implementation — connection tracking and consistent hashing — is next.