Skip to content
All reading material
Foundations·18 min read

Arrays & Hashing

The full toolkit — dynamic arrays, hash maps and sets, frequency counting, and the in-place array algorithms (Kadane, Dutch flag, cyclic sort, majority vote).


Why this is the foundation

A huge fraction of interview problems reduce to two ideas: O(1) indexed access (arrays) and O(1) average lookup by key (hash maps). Master these and the recurring move — trade space for time by remembering what you’ve seen — and most “easy/medium” problems fall.

Arrays: how they actually work

An array is a contiguous block of memory, so element i is at base + i × sizeO(1) random access. The consequences:

OperationCostWhy
Access / update by indexO(1)direct address
Append (dynamic array)amortized O(1)occasional resize
Insert / delete at front or middleO(n)shift everything after
Search (unsorted)O(n)scan
Search (sorted)O(log n)binary search

Dynamic arrays (Python list, Java ArrayList) grow by doubling capacity when full. Each resize is O(n), but doublings are rare enough that n appends cost O(n) total — amortized O(1) per append. Know this; it’s a classic follow-up.

Hash maps and sets: how they actually work

A hash map stores key→value by computing hash(key) % capacity to find a bucket.

  • Average O(1) get/put/delete; worst case O(n) if many keys collide (mitigated by good hash functions and resizing at a load factor threshold).
  • Collisions are handled by chaining (linked lists/trees per bucket) or open addressing.
  • Sets are hash maps without values — O(1) membership tests.
  • Keys must be hashable (immutable): numbers, strings, tuples — not lists/dicts.

The interview-relevant truth: hash operations are O(1) average, and that’s what turns O(n²) scans into O(n).

Pattern 1 — Remember what you’ve seen (the core move)

When brute force checks every pair (O(n²)), ask: “what if I’d already stored what I’m looking for?” A hash map answers in one pass:

def two_sum(nums, target):
    seen = {}                       # value -> index
    for i, x in enumerate(nums):
        if target - x in seen:
            return [seen[target - x], i]
        seen[x] = i

O(n) time, O(n) space. This single idea powers two-sum, “contains duplicate,” “longest consecutive sequence,” and countless others.

Pattern 2 — Frequency counting

Counting occurrences is ubiquitous — anagrams, majority, top-K, sliding windows:

from collections import Counter
counts = Counter("mississippi")     # {'i': 4, 's': 4, 'p': 2, 'm': 1}

For a bounded alphabet (e.g. lowercase letters), a fixed int[26] is faster and O(1) space. Comparing two frequency maps decides anagrams in O(n).

Pattern 3 — Hashing tricks for grouping

The key you choose unlocks the problem:

  • Group anagrams → key by the sorted string or a count tuple (0,2,0,...).
  • Dedup / seen-before → a set, or a frozenset/tuple for composite keys.
  • Constant-time getRandom + dedup → hash map of value→index alongside an array (the “Insert Delete GetRandom” design).

The in-place array algorithms (know these by name)

Array problems often want O(1) extra space via clever index manipulation:

  • Kadane’s algorithm — maximum subarray sum in O(n): track the best sum ending here.

    def max_subarray(nums):
        best = cur = nums[0]
        for x in nums[1:]:
            cur = max(x, cur + x)       # extend or restart
            best = max(best, cur)
        return best
  • Dutch National Flag (sort colors) — partition into <, =, > a pivot in one pass with three pointers; O(n), O(1).

  • Boyer–Moore majority vote — find the element appearing > n/2 times in O(n)/O(1) by canceling pairs.

  • Cyclic sort — when values are in range [1, n], place each at its index in O(n) to find missing/duplicate numbers without extra space.

  • In-place reverse / rotate — reverse subranges to rotate an array by k in O(n)/O(1) (reverse whole, then reverse the two parts).

  • Two pointers to overwrite in place (remove duplicates from a sorted array, move zeroes) — a write pointer trails a read pointer. (See the two-pointers primer.)

Prefix sums (preview)

Cumulative sums let you answer “sum of range [i, j]” in O(1) after O(n) preprocessing, and turn “subarray summing to k” into a hash-map one-pass. It’s important enough to have its own primer — reach for it whenever you see repeated range queries or subarray-sum counting.

Common pitfalls

  • Mutating a list while iterating it (iterate a copy, or build a new list).
  • Using a mutable object as a key (lists aren’t hashable — use a tuple).
  • KeyError — use dict.get(k, default) or collections.defaultdict.
  • Hash-map space cost — if the input is already sorted, two pointers or binary search often solve it in O(1) space; weigh the memory before reaching for a map.

Choosing in an interview

  • Need to look something up by value/key fast → hash map/set (O(1), O(n) space).
  • Counting / anagram / top-K → frequency map.
  • Array is sortedtwo pointers or binary search (O(1) space).
  • Max/optimal contiguous subarray → Kadane.
  • Values bounded to [1,n] or a small alphabet → cyclic sort / counting (O(1) space).

State the time and space trade — “O(n) time with a hash map, or O(1) space with two pointers if I can sort” — because that comparison is exactly the signal interviewers want.