Building search autocomplete (typeahead)
Implement the trie with cached top-k per node, the offline aggregation that builds it, prefix sharding, and client debouncing.
The trie node with cached top-k
Each node caches the top-k completions for its prefix, so a query returns them directly after the walk:
class TrieNode:
def __init__(self):
self.children = {} # char -> TrieNode
self.top_k = [] # [(query, score)] precomputed, sorted desc
def suggest(root, prefix, k=10):
node = root
for ch in prefix:
node = node.children.get(ch)
if node is None: return [] # no completions for this prefix
return node.top_k[:k] # O(prefix) walk, then O(1) return
The win: no traversal of descendants at query time — the answer is precomputed at the node.
Building the trie from query logs
A batch (or streaming) job aggregates query frequencies, then inserts each query into the trie and propagates its score to every prefix node’s top-k:
def build(query_counts, k=10):
root = TrieNode()
for query, score in query_counts.items():
node = root
for ch in query:
node = node.children.setdefault(ch, TrieNode())
push_top_k(node.top_k, (query, score), k) # keep node's best k
return root
def push_top_k(heapish, item, k):
heapish.append(item)
heapish.sort(key=lambda x: -x[1])
del heapish[k:] # keep only top k
Each prefix node now knows its best k completions. Build offline, then ship the serialized trie to serving nodes (atomic swap to the new version).
Updating with trending terms
Re-run the build periodically (hourly) over a recent window so trending queries climb the rankings; blend long-term popularity with a recency-weighted count. A streaming job can apply incremental count deltas and bump affected nodes’ top-k between full rebuilds.
Sharding the trie
The full trie is huge, so shard by prefix — e.g. all prefixes starting “a” on one shard set. A router sends a request to the shard owning its first character(s); replicate hot shards (short, common prefixes get the most traffic).
def route(prefix):
return shards[hash(prefix[:1]) % num_shards] # or a prefix->shard map
Client-side debouncing
Don’t fire a request per keystroke — wait until typing pauses briefly:
input.on("keyup", debounce(() => fetchSuggestions(input.value), 150));
Debounce + caching short prefixes (CDN/edge) slashes request volume — most of the win on the read side.
Serving and caching
Suggestions for very common prefixes (“a”, “fa”, “we”) are cached at the edge/CDN since they’re identical for everyone and queried constantly. Personalized results merge a small per-user signal at serve time on top of the global top-k.
Failure handling and edge cases
- Stale trie is acceptable — serving an hour-old top-k is fine; never block reads on the rebuild.
- Misspellings → add fuzzy matching (edit distance) or index common typos.
- Cold/new query not yet in the trie → simply no suggestion until the next build picks it up.
- Profanity/safety → filter the completion set at build time.
- Hot prefix shard → replicate it; the edge cache absorbs most of it anyway.
The takeaway
Concrete signals: a trie with precomputed top-k per node for O(prefix) lookups, an offline aggregation that builds and atomically ships it, prefix sharding + edge caching of common prefixes, and client debounce. It’s a clean fan-out-on-write (precompute reads) design — the same instinct powers feeds and search ranking.