Database indexes
How a lookup goes from scanning every row to a logarithmic jump — and why every index you add taxes your writes.
The problem they solve
Without an index, finding rows that match a condition means a full table scan — read every row and check it, O(n). At a billion rows that’s hopeless on the hot path. An index is an auxiliary data structure that lets the database jump straight to the matching rows, turning O(n) into roughly O(log n).
The mental model: an index is the book’s index at the back. Instead of reading every page to find “quorum,” you look up the term and jump to the page.
How they work
Most relational indexes are B-trees (balanced, sorted) — they keep keys in
order so the DB can binary-search and also scan ranges efficiently (WHERE age BETWEEN 20 AND 30, ORDER BY). Hash indexes give O(1) exact-match lookups
but can’t do ranges. Specialized engines add inverted indexes (for full-text
search), LSM-trees (write-optimized), and geospatial indexes.
The cost: writes get slower
This is the trade-off interviewers want you to name. An index is a copy of the
data kept sorted, so every INSERT, UPDATE, or DELETE must update every
relevant index too. Indexes also consume storage and memory.
Indexes make reads fast and writes slower. Add them for the columns you query and filter on — not for every column.
So a write-heavy table wants few indexes; a read-heavy table that filters and sorts many ways can afford more. This is a direct lever you adjust based on the read:write ratio you estimated.
Things worth knowing for interviews
- Primary index / clustered index — the table is physically stored in primary key order; there’s one. Lookups by primary key are fastest.
- Secondary index — an extra index on a non-primary column; it points back to the row. You can have many.
- Composite index — an index on multiple columns
(a, b). Order matters: it helps queries filtering ona, orathenb, but notbalone. - Covering index — includes every column a query needs, so the DB answers from the index without touching the table at all.
When it comes up in a design
Two moments: (1) deciding access patterns — “we query by user_id and sort by
created_at, so I’ll index (user_id, created_at)”; and (2) explaining a
write-throughput bottleneck — “this table is write-heavy, so I’ll keep indexes
minimal and push search to a separate inverted index updated asynchronously.”
That last move (offloading search to a dedicated index like Elasticsearch) shows
up constantly in Chapter 4.