Heaps & Priority Queues
Always-on access to the extreme element — heap mechanics, Top-K (heap vs quickselect), the two-heap median, k-way merge, and scheduling.
The idea
A binary heap keeps the minimum (or maximum) at the root with O(log n) push/pop and
O(1) peek. You never sort the whole thing — you just repeatedly grab the next extreme
element. Python’s heapq is a min-heap.
import heapq
h = []
heapq.heappush(h, 5); heapq.heappush(h, 1); heapq.heappush(h, 3)
heapq.heappop(h) # 1 (smallest)
How it works
A heap is a complete binary tree stored in an array: node i’s children are 2i+1 and
2i+2. The heap invariant (parent ≤ children for a min-heap) is restored by sift-up after a
push and sift-down after a pop — each O(log n) because the tree is balanced. Building a heap
from n items is O(n) (heapq.heapify), not O(n log n) — a nice fact to know.
- Max-heap in Python — push negatives (
-x), or wrap values. - Order by a key — push tuples
(priority, item); add a tiebreaker counter if items aren’t comparable.
Top-K with a bounded heap
To keep the K largest, hold a min-heap of size K and evict the smallest when it overflows — O(n log k), far better than sorting when k ≪ n:
def k_largest(nums, k):
h = []
for x in nums:
heapq.heappush(h, x)
if len(h) > k: heapq.heappop(h) # drop the smallest of the K+1
return h # the k largest (unordered)
Top-K trade-off: sorting is O(n log n); a bounded heap is O(n log k); quickselect is O(n) average (see sorting/divide-and-conquer) but unstable and worst-case O(n²). Name all three and pick by constraints (streaming → heap; one-shot, average-case → quickselect).
The two-heap median
Maintain a max-heap of the lower half and a min-heap of the upper half, balanced in size; the median is the root(s). O(log n) insert, O(1) median — “Find Median from a Data Stream,” “sliding window median.”
k-way merge
A heap of the current head of each of k sorted sequences merges them in O(N log k) — “merge k sorted lists,” “smallest range covering k lists,” external sort. Pop the min, push its successor.
Scheduling and “next extreme” problems
- Task Scheduler / reorganize string — repeatedly take the most frequent remaining item.
- Meeting Rooms II — a min-heap of end times tells you the peak concurrency.
- Dijkstra / Prim — a min-heap of (distance, node) drives weighted shortest path / MST (see the advanced-graphs primer).
- “K closest points,” “kth largest in a stream” — bounded heap.
Heuristic: if you keep needing “the current smallest/largest/most-urgent,” it’s a heap.
Heap vs balanced BST / sorted structure
A heap gives O(1) min but O(n) search and no ordered iteration. If you need ordered traversal, predecessor/successor, or arbitrary deletion, use a balanced BST / ordered set (its own primer). For arbitrary deletion from a heap, use lazy deletion: mark removed entries and skip them when they surface.
Pitfalls
- Max-heap — remember to negate (and negate back).
- Unorderable tuples — add a unique counter as a tiebreaker to avoid comparing the payload.
- Rebuilding instead of updating — for “decrease-key” (Dijkstra), push a new entry and lazily skip stale ones rather than searching the heap.
Choosing in an interview
- “K largest/smallest/most-frequent” → bounded heap (or quickselect for one-shot O(n)).
- Streaming median → two heaps.
- Merge k sorted → heap of heads.
- “Always take the most urgent/soonest” (scheduling, rooms) → min-heap of the key.
- Weighted shortest path / MST → heap-backed Dijkstra/Prim.