Load balancing
The traffic cop that spreads requests across servers, removes single points of failure, and lets you scale out at all.
What it does
A load balancer (LB) sits in front of a pool of servers and distributes incoming requests across them. It’s the component that makes horizontal scaling usable: clients hit one stable address, and the LB fans traffic out to whatever healthy servers exist behind it.
Three jobs at once:
- Spread load so no one server is overwhelmed while others idle.
- Remove single points of failure — if a server dies, the LB stops sending it traffic.
- Enable elasticity — add or drain servers without clients noticing.
Where load balancers live
You rarely have just one. They appear at every tier:
clients → [LB] → web servers → [LB] → application servers → [LB] → databases
Balancing between each layer means any tier can scale independently.
Health checks
The LB continuously probes its backends (e.g. an HTTP GET /health). A server
that fails checks is pulled from rotation until it recovers. This is what turns a
server crash from an outage into a non-event — but it means your app needs a
real health endpoint that reflects whether it can actually serve.
Don’t make the balancer the single point of failure
If everything flows through one LB and it dies, you’re down. Production setups run redundant load balancers — typically an active–passive pair sharing a virtual IP, where the passive takes over on failover, or an active–active set behind DNS. Mentioning this unprompted is a good reliability signal.
L4 vs L7 balancing
- Layer 4 (transport) — routes by IP/port, blind to the request contents. Very fast, cheap, protocol-agnostic.
- Layer 7 (application) — understands HTTP, so it can route by URL path,
header, or cookie (e.g.
/api/*to one pool,/video/*to another), terminate TLS, and do sticky sessions. More work per request, far more flexible.
Use L7 when you need content-aware routing; L4 when you need raw speed.
The sticky-session wrinkle
If a server stores per-user session state in memory, the LB must keep sending that user to the same server (a “sticky session”). This works but undermines elasticity and fault tolerance — if that server dies, the session is gone. The better pattern: keep servers stateless and push session state into a shared store (a cache or database), so any server can handle any request. (See the stateful-vs-stateless trade-off in Chapter 3.)
The how of distributing requests — round robin, least connections, hashing — is the next lesson.