Skip to content
System design course
Ch.3 · Trade-offs that define a design·concept ·7 min read

Token bucket vs leaky bucket

The two classic rate-limiting algorithms — one allows controlled bursts, the other enforces a perfectly smooth rate — plus the window-based alternatives.


The job: rate limiting and traffic shaping

You need to cap how fast requests are accepted — to protect a service from abuse or overload, or to smooth a bursty stream. Token bucket and leaky bucket are the two canonical algorithms, and the difference is how they treat bursts.

Token bucket

A bucket holds up to B tokens and refills at R tokens/sec. Each request consumes one token; if the bucket is empty, the request is rejected (or queued).

refill: tokens = min(B, tokens + R * elapsed)
allow request if tokens >= 1: tokens -= 1, else reject
  • Bursts: allowed, up to the bucket size B. If traffic was idle, tokens accumulate (to a cap), so a sudden burst of up to B is served immediately — then it’s limited to the steady rate R.
  • Feel: “average rate R, but tolerate short spikes.” Matches real user behavior (people click in bursts).
  • State: just a token count + last-refill timestamp per key — cheap, easy to distribute (store in Redis).
  • Used by: most API rate limiters (Stripe, AWS, etc.) — it’s the common default.

Leaky bucket

Requests enter a queue (the bucket) and are processed — “leak” — at a fixed rate, like water draining through a hole. If the queue is full, new requests overflow and are dropped.

queue requests; process exactly R per second;
if queue is full (capacity B): drop the new request
  • Bursts: smoothed away. Output is a perfectly constant rate regardless of how bursty the input was — a burst just fills the queue and drains steadily.
  • Feel: “strictly constant output rate.” Great for traffic shaping protecting a downstream that can only handle a fixed throughput.
  • Cost: queued requests wait (added latency), and bursts beyond the queue are dropped; you maintain an actual queue.

The core difference

Token bucket allows bursts (up to the bucket) then limits the average. Leaky bucket forbids bursts — it enforces a smooth, constant output rate.

Pick token bucket when occasional bursts are fine and you care about the average (typical API limits). Pick leaky bucket when a downstream needs a strictly even flow no matter the input shape.

The window-based alternatives (worth knowing)

Rate limiting also gets done with counters over time windows:

  • Fixed window — count requests per fixed interval (e.g. per minute), reset at the boundary. Simple, but allows a 2× burst at the boundary (end of one window + start of the next).
  • Sliding window log — store timestamps and count those within the last 60s. Accurate, but memory-heavy.
  • Sliding window counter — blend the current and previous fixed windows by weight. A cheap, accurate-enough middle ground that fixes the boundary spike.

The interview cue

When a design needs an API rate limiter (a whole Chapter 4 problem), reach for token bucket: “Token bucket per API key in Redis — B burst capacity refilling at R/sec — so we tolerate natural bursts but cap the sustained rate. If a fragile downstream needed a perfectly constant feed, I’d use a leaky bucket to shape it. At scale I’d note the boundary-burst flaw of fixed windows and prefer a sliding window counter.” Naming the burst behavior of each is exactly the distinction being tested.