Building Google Maps
Implement A* and the contraction-hierarchies query, live-traffic edge weighting, tile serving, and rerouting.
The graph and A*
Roads as a weighted graph; A* (Dijkstra + a goal-directed heuristic) is the baseline:
def a_star(graph, src, dst):
pq = [(0, src)] # (estimated total, node)
g = {src: 0} # best known cost to node
while pq:
_, u = heappop(pq)
if u == dst: return reconstruct(u)
for v, w in graph.edges(u): # w = live travel time
ng = g[u] + w
if ng < g.get(v, INF):
g[v] = ng
f = ng + heuristic(v, dst) # straight-line/lower-bound time
heappush(pq, (f, v))
A* is fine for short routes but still explores too much continent-scale → precompute.
Contraction hierarchies (precomputed shortcuts)
Offline, “contract” nodes least-to-most important, adding shortcut edges that preserve shortest paths; online, run a bidirectional search that only ever goes “upward” in the hierarchy, touching a tiny fraction of nodes:
offline: order nodes by importance; contract each, adding shortcuts where needed
→ an augmented graph (original edges + shortcuts), stored
online: bidirectional Dijkstra restricted to higher-importance neighbors
→ meets in the middle in milliseconds (vs exploring the whole map)
The query is orders of magnitude faster; the cost is the heavy preprocessing + extra shortcut storage — the classic precompute-for-fast-reads trade.
Live-traffic edge weights
Edge weights are travel times derived from live speeds; a traffic pipeline keeps them current:
def traffic_pipeline():
for batch in gps_stream.consume_batched(window="1m"):
for segment, speeds in group_by_segment(batch): # anonymized phone GPS
avg = robust_mean(speeds)
edge_weight[segment] = length(segment) / avg # current travel time
# routing reads edge_weight; CH uses CRP-style re-customization to absorb changes
Because changing weights breaks static CH, production uses customizable route planning (separate the graph structure from the weights so only weights re-customize, cheaply).
ETA
ETA = the route’s summed (traffic-adjusted) edge times, plus near-future traffic prediction for segments you’ll reach later. Recompute as the trip progresses.
Map tiles
Serve pre-rendered tiles from a CDN, addressed by zoom/x/y (a quadtree):
def tile(z, x, y):
return cdn_url(f"tiles/{z}/{x}/{y}.pbf") # vector tile, rendered client-side
Pan/zoom fetches visible tiles; immutable + CDN-cached = cheap, fast, global (media-system pattern).
Rerouting (turn-by-turn)
During navigation, if the driver goes off-route or traffic shifts, recompute from the current position:
def on_position(trip, lat, lng):
if off_route(trip.route, lat, lng) or traffic_worsened(trip.route):
trip.route = route(current_node(lat, lng), trip.destination) # fast CH query
push(trip, trip.route) # update nav instructions
Fast CH queries make continuous rerouting affordable.
Scale and failure handling
- Graph + shortcuts sharded/replicated by region; routing across regions stitches boundary nodes.
- Routing service is stateless and horizontally scaled; queries are fast (CH).
- Traffic pipeline is high-volume → streamed/aggregated; weights updated continuously.
- Tiles → CDN absorbs all map rendering load.
- Stale traffic → degrade to historical speeds for a segment if live data is missing.
The takeaway
Concrete signals: precomputed contraction hierarchies (with customizable re-weighting) for millisecond routing instead of plain Dijkstra, live-traffic edge weights from aggregated GPS, CDN-served tiles, and fast rerouting. Graph precomputation is the core idea — trading offline preprocessing for interactive queries.