Token bucket vs leaky bucket
The two classic rate-limiting algorithms — one allows controlled bursts, the other enforces a perfectly smooth rate — plus the window-based alternatives.
The job: rate limiting and traffic shaping
You need to cap how fast requests are accepted — to protect a service from abuse or overload, or to smooth a bursty stream. Token bucket and leaky bucket are the two canonical algorithms, and the difference is how they treat bursts.
Token bucket
A bucket holds up to B tokens and refills at R tokens/sec. Each request consumes one token; if the bucket is empty, the request is rejected (or queued).
refill: tokens = min(B, tokens + R * elapsed)
allow request if tokens >= 1: tokens -= 1, else reject
- Bursts: allowed, up to the bucket size
B. If traffic was idle, tokens accumulate (to a cap), so a sudden burst of up toBis served immediately — then it’s limited to the steady rateR. - Feel: “average rate
R, but tolerate short spikes.” Matches real user behavior (people click in bursts). - State: just a token count + last-refill timestamp per key — cheap, easy to distribute (store in Redis).
- Used by: most API rate limiters (Stripe, AWS, etc.) — it’s the common default.
Leaky bucket
Requests enter a queue (the bucket) and are processed — “leak” — at a fixed rate, like water draining through a hole. If the queue is full, new requests overflow and are dropped.
queue requests; process exactly R per second;
if queue is full (capacity B): drop the new request
- Bursts: smoothed away. Output is a perfectly constant rate regardless of how bursty the input was — a burst just fills the queue and drains steadily.
- Feel: “strictly constant output rate.” Great for traffic shaping protecting a downstream that can only handle a fixed throughput.
- Cost: queued requests wait (added latency), and bursts beyond the queue are dropped; you maintain an actual queue.
The core difference
Token bucket allows bursts (up to the bucket) then limits the average. Leaky bucket forbids bursts — it enforces a smooth, constant output rate.
Pick token bucket when occasional bursts are fine and you care about the average (typical API limits). Pick leaky bucket when a downstream needs a strictly even flow no matter the input shape.
The window-based alternatives (worth knowing)
Rate limiting also gets done with counters over time windows:
- Fixed window — count requests per fixed interval (e.g. per minute), reset at the boundary. Simple, but allows a 2× burst at the boundary (end of one window + start of the next).
- Sliding window log — store timestamps and count those within the last 60s. Accurate, but memory-heavy.
- Sliding window counter — blend the current and previous fixed windows by weight. A cheap, accurate-enough middle ground that fixes the boundary spike.
The interview cue
When a design needs an API rate limiter (a whole Chapter 4 problem), reach for
token bucket: “Token bucket per API key in Redis — B burst capacity refilling
at R/sec — so we tolerate natural bursts but cap the sustained rate. If a fragile
downstream needed a perfectly constant feed, I’d use a leaky bucket to shape it.
At scale I’d note the boundary-burst flaw of fixed windows and prefer a sliding
window counter.” Naming the burst behavior of each is exactly the distinction
being tested.