Union–Find (Disjoint Set Union)
Near-O(1) merging and connectivity — path compression + union by rank, with the components, cycle-detection, and Kruskal's-MST uses (and weighted DSU).
The idea
Union–Find (DSU) maintains a partition of elements into disjoint sets and answers “are these two
in the same set?” (find) and “merge these two sets” (union) in near-constant amortized
time — O(α(n)), the inverse Ackermann function, effectively ≤ 4. The two optimizations that
get there are path compression and union by rank/size.
class DSU:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.count = n # number of disjoint sets
def find(self, x):
while self.parent[x] != x:
self.parent[x] = self.parent[self.parent[x]] # path compression (halving)
x = self.parent[x]
return x
def union(self, a, b):
ra, rb = self.find(a), self.find(b)
if ra == rb: return False # already connected (edge closes a cycle)
if self.rank[ra] < self.rank[rb]: ra, rb = rb, ra
self.parent[rb] = ra # attach smaller tree under larger
if self.rank[ra] == self.rank[rb]: self.rank[ra] += 1
self.count -= 1
return True
Why both optimizations matter
- Union by rank/size keeps trees shallow by always hanging the smaller tree under the larger — without it, unions can build a linked-list-deep tree (O(n) finds).
- Path compression flattens the path to the root during
find, so repeated queries get cheaper. Together they give the near-O(1) bound; either alone is worse.
Track count to get the number of components for free.
When to reach for it
- Connected components — union every edge, read
count(“Number of Provinces,” “Number of Connected Components,” “Accounts Merge,” “Most Stones Removed”). - Cycle detection in an undirected graph —
unionreturningFalsemeans the edge connects two already-joined nodes → a cycle (“Redundant Connection,” “Graph Valid Tree”). - Dynamic connectivity — edges/equivalences arrive incrementally and you only need connectivity; DSU beats re-running DFS per query.
- Kruskal’s MST — sort edges, add each that doesn’t form a cycle (DSU is the cycle check); see the advanced-graphs primer.
- Grouping by equivalence — emails to accounts, equations for equality, “similar strings.”
Variants worth knowing
- Weighted / relational DSU — store an offset to the parent to track relationships (e.g.
ratios in “Evaluate Division,” or parity for “are these in opposite sets”).
findaccumulates the offset. - DSU on a grid — map
(r, c) → r*cols + c; union adjacent same-type cells (“Number of Islands,” with the rollback trick for “make connections over time”). - Rollback / persistent DSU — undo unions (no path compression, union by rank only) for offline queries.
DSU vs DFS/BFS
| Union–Find | DFS/BFS | |
|---|---|---|
| Incremental edges, many connectivity queries | great (near-O(1) each) | re-traverse per query |
| One-shot components on a static graph | fine | fine |
| Shortest path / actual traversal order | no | yes |
Reach for DSU when edges arrive over time and you only need connectivity; use traversal when you need paths or to visit nodes.
Pitfalls
- Forgetting an optimization — without union-by-rank and compression you lose the guarantee.
- Indexing — map non-integer elements (strings, coordinates) to integer ids first (a dict).
- Counting — decrement
countonly on a successful union.
Choosing in an interview
If the problem is “are X and Y connected,” “how many groups,” or “does adding this edge create a
cycle” — and especially if edges stream in — it’s union-find. Mention α(n) near-O(1) and the two
optimizations.