Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·8 min read

Building a load balancer

Implement backend selection and health tracking, use consistent hashing for affinity, and handle connection draining and failover without dropping requests.


Backend pool with health state

Keep a registry of backends, each with health and load counters; selection only considers healthy ones:

class Backend:
    def __init__(self, addr, weight=1):
        self.addr, self.weight = addr, weight
        self.healthy = True
        self.active_conns = 0
        self.fail_count = 0

class Pool:
    def __init__(self, backends): self.backends = backends
    def healthy(self): return [b for b in self.backends if b.healthy]

The selection algorithms

import itertools
class RoundRobin:
    def __init__(self, pool): self.pool, self._it = pool, None
    def pick(self):
        live = self.pool.healthy()
        return live[next(self._counter) % len(live)] if live else None

class LeastConnections:
    def pick(self):
        live = self.pool.healthy()
        return min(live, key=lambda b: b.active_conns) if live else None

On dispatch, increment active_conns; on response/close, decrement — so least-connections reflects reality and long requests naturally shed new load.

Health checking loop

A background task probes each backend and flips health with hysteresis so a single blip doesn’t eject a node:

async def health_loop(pool):
    while True:
        for b in pool.backends:
            ok = await probe(b.addr, "/health", timeout=1.0)
            if ok:
                b.fail_count = 0; b.healthy = True
            else:
                b.fail_count += 1
                if b.fail_count >= 3: b.healthy = False    # eject after 3 fails
        await asyncio.sleep(2)

Consistent hashing for affinity

When you need a key (session/cache key) to stick to the same backend even as the pool changes, use consistent hashing with virtual nodes (Chapter 2) so adding/removing a backend remaps only ~1/N of keys instead of all of them:

class HashRing:
    def __init__(self, backends, vnodes=150):
        self.ring = {}                          # hash -> backend
        for b in backends:
            for i in range(vnodes):
                self.ring[hash_(f"{b.addr}#{i}")] = b
        self.sorted = sorted(self.ring)
    def pick(self, key):
        h = hash_(key)
        idx = bisect.bisect(self.sorted, h) % len(self.sorted)
        return self.ring[self.sorted[idx]]

This is also how you’d shard sticky sessions without a shared store — though externalizing sessions (stateless backends) is usually cleaner.

Connection draining (don’t drop requests)

When removing a backend (deploy/scale-down), drain it: stop sending new connections but let in-flight ones finish before shutdown.

def drain(backend):
    backend.healthy = False          # stop new traffic
    wait_until(lambda: backend.active_conns == 0, timeout=30)
    remove(backend)

Same idea for graceful failover — never cut a live request if you can let it complete.

Failover of the balancer itself

The active–passive VIP from the design lesson, in practice: both LBs run keepalived; the passive watches the active’s heartbeat and claims the virtual IP (sends a gratuitous ARP) when beats stop. Clients keep using the same IP; the switch is seconds. Backend health state should be reconstructable (re-probe on takeover) so the new active doesn’t start blind.

Edge cases

  • All backends unhealthy → return 503 fast (don’t hang); alert.
  • Thundering herd on recovery → ramp traffic to a just-recovered backend (slow start) instead of full load instantly.
  • Uneven long-lived connections (WebSockets) → least-connections or explicit connection counting, since round robin ignores duration.
  • Retry idempotency → only auto-retry a failed backend for idempotent requests.

The takeaway

Concrete signals: health checks with hysteresis, least-connections for uneven load, consistent hashing for affinity, connection draining so deploys drop nothing, and a VIP failover so the LB isn’t a SPOF. These are the same primitives (consistent hashing, heartbeats, stateless backends) you’ll reuse across every distributed system ahead.