Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·7 min read

Building search autocomplete (typeahead)

Implement the trie with cached top-k per node, the offline aggregation that builds it, prefix sharding, and client debouncing.


The trie node with cached top-k

Each node caches the top-k completions for its prefix, so a query returns them directly after the walk:

class TrieNode:
    def __init__(self):
        self.children = {}              # char -> TrieNode
        self.top_k = []                 # [(query, score)] precomputed, sorted desc

def suggest(root, prefix, k=10):
    node = root
    for ch in prefix:
        node = node.children.get(ch)
        if node is None: return []      # no completions for this prefix
    return node.top_k[:k]               # O(prefix) walk, then O(1) return

The win: no traversal of descendants at query time — the answer is precomputed at the node.

Building the trie from query logs

A batch (or streaming) job aggregates query frequencies, then inserts each query into the trie and propagates its score to every prefix node’s top-k:

def build(query_counts, k=10):
    root = TrieNode()
    for query, score in query_counts.items():
        node = root
        for ch in query:
            node = node.children.setdefault(ch, TrieNode())
            push_top_k(node.top_k, (query, score), k)   # keep node's best k
    return root

def push_top_k(heapish, item, k):
    heapish.append(item)
    heapish.sort(key=lambda x: -x[1])
    del heapish[k:]                     # keep only top k

Each prefix node now knows its best k completions. Build offline, then ship the serialized trie to serving nodes (atomic swap to the new version).

Re-run the build periodically (hourly) over a recent window so trending queries climb the rankings; blend long-term popularity with a recency-weighted count. A streaming job can apply incremental count deltas and bump affected nodes’ top-k between full rebuilds.

Sharding the trie

The full trie is huge, so shard by prefix — e.g. all prefixes starting “a” on one shard set. A router sends a request to the shard owning its first character(s); replicate hot shards (short, common prefixes get the most traffic).

def route(prefix):
    return shards[hash(prefix[:1]) % num_shards]   # or a prefix->shard map

Client-side debouncing

Don’t fire a request per keystroke — wait until typing pauses briefly:

input.on("keyup", debounce(() => fetchSuggestions(input.value), 150));

Debounce + caching short prefixes (CDN/edge) slashes request volume — most of the win on the read side.

Serving and caching

Suggestions for very common prefixes (“a”, “fa”, “we”) are cached at the edge/CDN since they’re identical for everyone and queried constantly. Personalized results merge a small per-user signal at serve time on top of the global top-k.

Failure handling and edge cases

  • Stale trie is acceptable — serving an hour-old top-k is fine; never block reads on the rebuild.
  • Misspellings → add fuzzy matching (edit distance) or index common typos.
  • Cold/new query not yet in the trie → simply no suggestion until the next build picks it up.
  • Profanity/safety → filter the completion set at build time.
  • Hot prefix shard → replicate it; the edge cache absorbs most of it anyway.

The takeaway

Concrete signals: a trie with precomputed top-k per node for O(prefix) lookups, an offline aggregation that builds and atomically ships it, prefix sharding + edge caching of common prefixes, and client debounce. It’s a clean fan-out-on-write (precompute reads) design — the same instinct powers feeds and search ranking.