Randomized Algorithms
Uniform sampling, fair shuffles, reservoir sampling, rejection sampling, weighted pick, and expected-time selection — with the off-by-one bias traps.
Why randomness helps
Randomization buys simplicity and expected-time guarantees without worst-case fragility — uniform sampling, unbiased shuffles, average-O(n) selection, and Monte-Carlo estimates. The catch: you must reason in expectation and prove the distribution is actually uniform (a subtle off-by-one is the classic bug).
Fisher–Yates shuffle
The only correct in-place uniform shuffle — swap each position with a random one at or after it:
import random
def shuffle(a):
for i in range(len(a) - 1, 0, -1):
j = random.randint(0, i) # inclusive 0..i — the key to uniformity
a[i], a[j] = a[j], a[i]
return a
The common broken version picks j from the whole array each step — that’s biased (not all
n! permutations equally likely). The range must shrink.
Reservoir sampling
Pick k items uniformly from a stream of unknown length in O(1) memory — keep the i-th
item with probability k/i:
def sample_one(stream): # k = 1
chosen = None
for i, x in enumerate(stream, 1):
if random.randint(1, i) == 1: # keep x with probability 1/i
chosen = x
return chosen
For k > 1, fill the reservoir with the first k, then replace a random slot with probability k/i. Powers “Linked List Random Node,” “Random Node in a stream.”
Weighted random pick
Choose an index with probability proportional to its weight: build a prefix-sum of weights,
draw a uniform value in [0, total), and binary-search the bucket — O(log n) per pick after
O(n) setup (“Random Pick with Weight”).
Rejection sampling
Generate from an easy distribution and discard out-of-range draws to hit a target one:
- Rand10 from Rand7 — draw in a 7×7 grid (49 outcomes), use the first 40, reject the rest.
- Random point in a circle — sample in the bounding square, reject points outside the circle.
Expected draws are small as long as the accept region is a decent fraction of the sample space.
Quickselect (expected-O(n) selection)
Randomized pivoting finds the k-th element in expected O(n) (worst O(n²)) — see the sorting and divide-and-conquer primers. The go-to for “Kth Largest” / “Top-K” when you can mutate the array and want better-than-sort average time.
Randomized data structures (name them)
- Skip list — a probabilistic balanced structure (ordered set with expected O(log n) ops).
- Treap — BST + heap priorities for expected balance.
- Bloom filter / HyperLogLog — probabilistic membership / cardinality (heavily used in system design).
Las Vegas vs Monte Carlo
- Las Vegas — always correct, runtime is random (quickselect, randomized quicksort).
- Monte Carlo — fixed runtime, answer is correct with high probability (Rabin–Karp hash match, Miller–Rabin primality). Repeat to drive the error probability down.
Pitfalls
- Off-by-one in the range → non-uniform distribution (the #1 bug; Fisher–Yates must pick from
0..iinclusive, shrinking). - Reusing a biased primitive — composing biased samples compounds the bias; verify each step is uniform.
- Reasoning about worst case when the guarantee is expected — state which you mean.
Choosing in an interview
- Shuffle an array → Fisher–Yates.
- Sample from a stream / unknown length → reservoir sampling.
- Pick by weight → prefix sum + binary search.
- Map one uniform generator to another → rejection sampling.
- k-th element fast on average → quickselect.
Always state the expected complexity and argue the distribution is uniform — that argument is the real test.