Skip to content
All reading material
Patterns·14 min read

Heaps & Priority Queues

Always-on access to the extreme element — heap mechanics, Top-K (heap vs quickselect), the two-heap median, k-way merge, and scheduling.


The idea

A binary heap keeps the minimum (or maximum) at the root with O(log n) push/pop and O(1) peek. You never sort the whole thing — you just repeatedly grab the next extreme element. Python’s heapq is a min-heap.

import heapq
h = []
heapq.heappush(h, 5); heapq.heappush(h, 1); heapq.heappush(h, 3)
heapq.heappop(h)        # 1  (smallest)

How it works

A heap is a complete binary tree stored in an array: node i’s children are 2i+1 and 2i+2. The heap invariant (parent ≤ children for a min-heap) is restored by sift-up after a push and sift-down after a pop — each O(log n) because the tree is balanced. Building a heap from n items is O(n) (heapq.heapify), not O(n log n) — a nice fact to know.

  • Max-heap in Python — push negatives (-x), or wrap values.
  • Order by a key — push tuples (priority, item); add a tiebreaker counter if items aren’t comparable.

Top-K with a bounded heap

To keep the K largest, hold a min-heap of size K and evict the smallest when it overflows — O(n log k), far better than sorting when k ≪ n:

def k_largest(nums, k):
    h = []
    for x in nums:
        heapq.heappush(h, x)
        if len(h) > k: heapq.heappop(h)   # drop the smallest of the K+1
    return h                              # the k largest (unordered)

Top-K trade-off: sorting is O(n log n); a bounded heap is O(n log k); quickselect is O(n) average (see sorting/divide-and-conquer) but unstable and worst-case O(n²). Name all three and pick by constraints (streaming → heap; one-shot, average-case → quickselect).

The two-heap median

Maintain a max-heap of the lower half and a min-heap of the upper half, balanced in size; the median is the root(s). O(log n) insert, O(1) median — “Find Median from a Data Stream,” “sliding window median.”

k-way merge

A heap of the current head of each of k sorted sequences merges them in O(N log k) — “merge k sorted lists,” “smallest range covering k lists,” external sort. Pop the min, push its successor.

Scheduling and “next extreme” problems

  • Task Scheduler / reorganize string — repeatedly take the most frequent remaining item.
  • Meeting Rooms II — a min-heap of end times tells you the peak concurrency.
  • Dijkstra / Prim — a min-heap of (distance, node) drives weighted shortest path / MST (see the advanced-graphs primer).
  • “K closest points,” “kth largest in a stream” — bounded heap.

Heuristic: if you keep needing “the current smallest/largest/most-urgent,” it’s a heap.

Heap vs balanced BST / sorted structure

A heap gives O(1) min but O(n) search and no ordered iteration. If you need ordered traversal, predecessor/successor, or arbitrary deletion, use a balanced BST / ordered set (its own primer). For arbitrary deletion from a heap, use lazy deletion: mark removed entries and skip them when they surface.

Pitfalls

  • Max-heap — remember to negate (and negate back).
  • Unorderable tuples — add a unique counter as a tiebreaker to avoid comparing the payload.
  • Rebuilding instead of updating — for “decrease-key” (Dijkstra), push a new entry and lazily skip stale ones rather than searching the heap.

Choosing in an interview

  • “K largest/smallest/most-frequent” → bounded heap (or quickselect for one-shot O(n)).
  • Streaming median → two heaps.
  • Merge k sorted → heap of heads.
  • “Always take the most urgent/soonest” (scheduling, rooms) → min-heap of the key.
  • Weighted shortest path / MST → heap-backed Dijkstra/Prim.