Skip to content
All reading material
Patterns·12 min read

Randomized Algorithms

Uniform sampling, fair shuffles, reservoir sampling, rejection sampling, weighted pick, and expected-time selection — with the off-by-one bias traps.


Why randomness helps

Randomization buys simplicity and expected-time guarantees without worst-case fragility — uniform sampling, unbiased shuffles, average-O(n) selection, and Monte-Carlo estimates. The catch: you must reason in expectation and prove the distribution is actually uniform (a subtle off-by-one is the classic bug).

Fisher–Yates shuffle

The only correct in-place uniform shuffle — swap each position with a random one at or after it:

import random
def shuffle(a):
    for i in range(len(a) - 1, 0, -1):
        j = random.randint(0, i)          # inclusive 0..i — the key to uniformity
        a[i], a[j] = a[j], a[i]
    return a

The common broken version picks j from the whole array each step — that’s biased (not all n! permutations equally likely). The range must shrink.

Reservoir sampling

Pick k items uniformly from a stream of unknown length in O(1) memory — keep the i-th item with probability k/i:

def sample_one(stream):                   # k = 1
    chosen = None
    for i, x in enumerate(stream, 1):
        if random.randint(1, i) == 1:     # keep x with probability 1/i
            chosen = x
    return chosen

For k > 1, fill the reservoir with the first k, then replace a random slot with probability k/i. Powers “Linked List Random Node,” “Random Node in a stream.”

Weighted random pick

Choose an index with probability proportional to its weight: build a prefix-sum of weights, draw a uniform value in [0, total), and binary-search the bucket — O(log n) per pick after O(n) setup (“Random Pick with Weight”).

Rejection sampling

Generate from an easy distribution and discard out-of-range draws to hit a target one:

  • Rand10 from Rand7 — draw in a 7×7 grid (49 outcomes), use the first 40, reject the rest.
  • Random point in a circle — sample in the bounding square, reject points outside the circle.

Expected draws are small as long as the accept region is a decent fraction of the sample space.

Quickselect (expected-O(n) selection)

Randomized pivoting finds the k-th element in expected O(n) (worst O(n²)) — see the sorting and divide-and-conquer primers. The go-to for “Kth Largest” / “Top-K” when you can mutate the array and want better-than-sort average time.

Randomized data structures (name them)

  • Skip list — a probabilistic balanced structure (ordered set with expected O(log n) ops).
  • Treap — BST + heap priorities for expected balance.
  • Bloom filter / HyperLogLog — probabilistic membership / cardinality (heavily used in system design).

Las Vegas vs Monte Carlo

  • Las Vegas — always correct, runtime is random (quickselect, randomized quicksort).
  • Monte Carlo — fixed runtime, answer is correct with high probability (Rabin–Karp hash match, Miller–Rabin primality). Repeat to drive the error probability down.

Pitfalls

  • Off-by-one in the range → non-uniform distribution (the #1 bug; Fisher–Yates must pick from 0..i inclusive, shrinking).
  • Reusing a biased primitive — composing biased samples compounds the bias; verify each step is uniform.
  • Reasoning about worst case when the guarantee is expected — state which you mean.

Choosing in an interview

  • Shuffle an array → Fisher–Yates.
  • Sample from a stream / unknown length → reservoir sampling.
  • Pick by weight → prefix sum + binary search.
  • Map one uniform generator to another → rejection sampling.
  • k-th element fast on average → quickselect.

Always state the expected complexity and argue the distribution is uniform — that argument is the real test.