Union–Find (Disjoint Set Union)

Near-O(1) merging and connectivity — path compression + union by rank, with the components, cycle-detection, and Kruskal's-MST uses (and weighted DSU).

The idea

Union–Find (DSU) maintains a partition of elements into disjoint sets and answers “are these two in the same set?” (find) and “merge these two sets” (union) in near-constant amortized time — O(α(n)), the inverse Ackermann function, effectively ≤ 4. The two optimizations that get there are path compression and union by rank/size.

class DSU:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
        self.count = n                       # number of disjoint sets
    def find(self, x):
        while self.parent[x] != x:
            self.parent[x] = self.parent[self.parent[x]]   # path compression (halving)
            x = self.parent[x]
        return x
    def union(self, a, b):
        ra, rb = self.find(a), self.find(b)
        if ra == rb: return False            # already connected (edge closes a cycle)
        if self.rank[ra] < self.rank[rb]: ra, rb = rb, ra
        self.parent[rb] = ra                 # attach smaller tree under larger
        if self.rank[ra] == self.rank[rb]: self.rank[ra] += 1
        self.count -= 1
        return True

Why both optimizations matter

Union by rank/size keeps trees shallow by always hanging the smaller tree under the larger — without it, unions can build a linked-list-deep tree (O(n) finds).
Path compression flattens the path to the root during find, so repeated queries get cheaper. Together they give the near-O(1) bound; either alone is worse.

Track count to get the number of components for free.

When to reach for it

Connected components — union every edge, read count (“Number of Provinces,” “Number of Connected Components,” “Accounts Merge,” “Most Stones Removed”).
Cycle detection in an undirected graph — union returning False means the edge connects two already-joined nodes → a cycle (“Redundant Connection,” “Graph Valid Tree”).
Dynamic connectivity — edges/equivalences arrive incrementally and you only need connectivity; DSU beats re-running DFS per query.
Kruskal’s MST — sort edges, add each that doesn’t form a cycle (DSU is the cycle check); see the advanced-graphs primer.
Grouping by equivalence — emails to accounts, equations for equality, “similar strings.”

Variants worth knowing

Weighted / relational DSU — store an offset to the parent to track relationships (e.g. ratios in “Evaluate Division,” or parity for “are these in opposite sets”). find accumulates the offset.
DSU on a grid — map (r, c) → r*cols + c; union adjacent same-type cells (“Number of Islands,” with the rollback trick for “make connections over time”).
Rollback / persistent DSU — undo unions (no path compression, union by rank only) for offline queries.

DSU vs DFS/BFS

	Union–Find	DFS/BFS
Incremental edges, many connectivity queries	great (near-O(1) each)	re-traverse per query
One-shot components on a static graph	fine	fine
Shortest path / actual traversal order	no	yes

Reach for DSU when edges arrive over time and you only need connectivity; use traversal when you need paths or to visit nodes.

Pitfalls

Forgetting an optimization — without union-by-rank and compression you lose the guarantee.
Indexing — map non-integer elements (strings, coordinates) to integer ids first (a dict).
Counting — decrement count only on a successful union.

Choosing in an interview

If the problem is “are X and Y connected,” “how many groups,” or “does adding this edge create a cycle” — and especially if edges stream in — it’s union-find. Mention α(n) near-O(1) and the two optimizations.