Database indexes · Lyte Code

How a lookup goes from scanning every row to a logarithmic jump — and why every index you add taxes your writes.

The problem they solve

Without an index, finding rows that match a condition means a full table scan — read every row and check it, O(n). At a billion rows that’s hopeless on the hot path. An index is an auxiliary data structure that lets the database jump straight to the matching rows, turning O(n) into roughly O(log n).

The mental model: an index is the book’s index at the back. Instead of reading every page to find “quorum,” you look up the term and jump to the page.

How they work

Most relational indexes are B-trees (balanced, sorted) — they keep keys in order so the DB can binary-search and also scan ranges efficiently (WHERE age BETWEEN 20 AND 30, ORDER BY). Hash indexes give O(1) exact-match lookups but can’t do ranges. Specialized engines add inverted indexes (for full-text search), LSM-trees (write-optimized), and geospatial indexes.

The cost: writes get slower

This is the trade-off interviewers want you to name. An index is a copy of the data kept sorted, so every INSERT, UPDATE, or DELETE must update every relevant index too. Indexes also consume storage and memory.

Indexes make reads fast and writes slower. Add them for the columns you query and filter on — not for every column.

So a write-heavy table wants few indexes; a read-heavy table that filters and sorts many ways can afford more. This is a direct lever you adjust based on the read:write ratio you estimated.

Things worth knowing for interviews

Primary index / clustered index — the table is physically stored in primary key order; there’s one. Lookups by primary key are fastest.
Secondary index — an extra index on a non-primary column; it points back to the row. You can have many.
Composite index — an index on multiple columns (a, b). Order matters: it helps queries filtering on a, or a then b, but not b alone.
Covering index — includes every column a query needs, so the DB answers from the index without touching the table at all.

When it comes up in a design

Two moments: (1) deciding access patterns — “we query by user_id and sort by created_at, so I’ll index (user_id, created_at)”; and (2) explaining a write-throughput bottleneck — “this table is write-heavy, so I’ll keep indexes minimal and push search to a separate inverted index updated asynchronously.” That last move (offloading search to a dedicated index like Elasticsearch) shows up constantly in Chapter 4.