Arrays & Hashing
The full toolkit — dynamic arrays, hash maps and sets, frequency counting, and the in-place array algorithms (Kadane, Dutch flag, cyclic sort, majority vote).
Why this is the foundation
A huge fraction of interview problems reduce to two ideas: O(1) indexed access (arrays) and O(1) average lookup by key (hash maps). Master these and the recurring move — trade space for time by remembering what you’ve seen — and most “easy/medium” problems fall.
Arrays: how they actually work
An array is a contiguous block of memory, so element i is at base + i × size →
O(1) random access. The consequences:
| Operation | Cost | Why |
|---|---|---|
| Access / update by index | O(1) | direct address |
| Append (dynamic array) | amortized O(1) | occasional resize |
| Insert / delete at front or middle | O(n) | shift everything after |
| Search (unsorted) | O(n) | scan |
| Search (sorted) | O(log n) | binary search |
Dynamic arrays (Python list, Java ArrayList) grow by doubling capacity when full.
Each resize is O(n), but doublings are rare enough that n appends cost O(n) total —
amortized O(1) per append. Know this; it’s a classic follow-up.
Hash maps and sets: how they actually work
A hash map stores key→value by computing hash(key) % capacity to find a bucket.
- Average O(1) get/put/delete; worst case O(n) if many keys collide (mitigated by good hash functions and resizing at a load factor threshold).
- Collisions are handled by chaining (linked lists/trees per bucket) or open addressing.
- Sets are hash maps without values — O(1) membership tests.
- Keys must be hashable (immutable): numbers, strings, tuples — not lists/dicts.
The interview-relevant truth: hash operations are O(1) average, and that’s what turns O(n²) scans into O(n).
Pattern 1 — Remember what you’ve seen (the core move)
When brute force checks every pair (O(n²)), ask: “what if I’d already stored what I’m looking for?” A hash map answers in one pass:
def two_sum(nums, target):
seen = {} # value -> index
for i, x in enumerate(nums):
if target - x in seen:
return [seen[target - x], i]
seen[x] = i
O(n) time, O(n) space. This single idea powers two-sum, “contains duplicate,” “longest consecutive sequence,” and countless others.
Pattern 2 — Frequency counting
Counting occurrences is ubiquitous — anagrams, majority, top-K, sliding windows:
from collections import Counter
counts = Counter("mississippi") # {'i': 4, 's': 4, 'p': 2, 'm': 1}
For a bounded alphabet (e.g. lowercase letters), a fixed int[26] is faster and O(1)
space. Comparing two frequency maps decides anagrams in O(n).
Pattern 3 — Hashing tricks for grouping
The key you choose unlocks the problem:
- Group anagrams → key by the sorted string or a count tuple
(0,2,0,...). - Dedup / seen-before → a
set, or afrozenset/tuplefor composite keys. - Constant-time getRandom + dedup → hash map of value→index alongside an array (the “Insert Delete GetRandom” design).
The in-place array algorithms (know these by name)
Array problems often want O(1) extra space via clever index manipulation:
-
Kadane’s algorithm — maximum subarray sum in O(n): track the best sum ending here.
def max_subarray(nums): best = cur = nums[0] for x in nums[1:]: cur = max(x, cur + x) # extend or restart best = max(best, cur) return best -
Dutch National Flag (sort colors) — partition into
<, =, >a pivot in one pass with three pointers; O(n), O(1). -
Boyer–Moore majority vote — find the element appearing > n/2 times in O(n)/O(1) by canceling pairs.
-
Cyclic sort — when values are in range
[1, n], place each at its index in O(n) to find missing/duplicate numbers without extra space. -
In-place reverse / rotate — reverse subranges to rotate an array by k in O(n)/O(1) (reverse whole, then reverse the two parts).
-
Two pointers to overwrite in place (remove duplicates from a sorted array, move zeroes) — a write pointer trails a read pointer. (See the two-pointers primer.)
Prefix sums (preview)
Cumulative sums let you answer “sum of range [i, j]” in O(1) after O(n) preprocessing, and turn “subarray summing to k” into a hash-map one-pass. It’s important enough to have its own primer — reach for it whenever you see repeated range queries or subarray-sum counting.
Common pitfalls
- Mutating a list while iterating it (iterate a copy, or build a new list).
- Using a mutable object as a key (lists aren’t hashable — use a tuple).
KeyError— usedict.get(k, default)orcollections.defaultdict.- Hash-map space cost — if the input is already sorted, two pointers or binary search often solve it in O(1) space; weigh the memory before reaching for a map.
Choosing in an interview
- Need to look something up by value/key fast → hash map/set (O(1), O(n) space).
- Counting / anagram / top-K → frequency map.
- Array is sorted → two pointers or binary search (O(1) space).
- Max/optimal contiguous subarray → Kadane.
- Values bounded to
[1,n]or a small alphabet → cyclic sort / counting (O(1) space).
State the time and space trade — “O(n) time with a hash map, or O(1) space with two pointers if I can sort” — because that comparison is exactly the signal interviewers want.