Skip to content
System design course
Ch.2 · The building blocks·concept ·7 min read

Load-balancing algorithms

How a load balancer actually picks a server — round robin, least connections, weighted, and hashing — and when each one wins.


The choice the balancer makes

Every request, the load balancer answers one question: which backend gets this? The algorithm it uses changes how evenly load lands, especially when requests vary in cost or servers vary in power.

Round robin

Hand requests to servers in rotation: 1, 2, 3, 1, 2, 3… Dead simple and fair when requests are uniform and servers are identical. Its weakness: it ignores how busy each server actually is, so one slow request can pile up behind a server that round robin keeps feeding.

Weighted round robin

Give beefier servers a higher weight so they receive proportionally more requests. Useful for heterogeneous fleets (mixed instance sizes) or gradual rollouts (send 5% of traffic to a new version).

Least connections

Send the next request to the server with the fewest active connections. This adapts to reality: long-lived or expensive requests keep a server’s count high, so it naturally receives less new work. Better than round robin when request durations vary a lot (e.g. some requests stream, others return instantly).

A variant, least response time, also factors in how fast each server has been replying.

Hashing (and why it matters)

Route based on a hash of some request attribute — commonly the client IP or a session/URL key:

  • IP hash gives a client a consistent server (a cheap form of session stickiness) without a shared session store.
  • URL/key hash sends the same key to the same server every time — essential for cache locality: requests for object X always hit the node that already cached X.

The catch with plain hash(key) % N: change N (add or remove a server) and almost every key remaps. That’s the exact problem consistent hashing solves (its own lesson later in this chapter) — it’s what you reach for when the backend set changes and you want minimal disruption.

Random (with two choices)

Pick two servers at random and send the request to the less-loaded of the two. Surprisingly effective — it gets most of the benefit of “least connections” without the balancer tracking global state, which makes it cheap at huge scale.

Picking one in an interview

  • Uniform, stateless requests → round robin.
  • Variable request cost → least connections.
  • Mixed server sizes → weighted.
  • Need cache locality or stickiness → hashing (reach for consistent hashing when the server set changes).

Say what you’re optimizing for and the choice follows.