Building Uber
Implement the in-memory location index updated per ping, lease-based dispatch that prevents double-matching, the trip state machine, and live tracking.
Updating driver locations (the write firehose)
Each ping moves a driver between cells in an in-memory index; history streams off to storage:
def on_location(driver_id, lat, lng):
cell = h3.geo_to_cell(lat, lng, RES)
old = driver_cell.get(driver_id)
if old != cell:
geo_index.remove(old, driver_id) # leave old cell
geo_index.add(cell, driver_id) # join new cell
driver_cell[driver_id] = cell
driver_loc[driver_id] = (lat, lng, now()) # latest position (in memory)
location_stream.publish(driver_id, lat, lng) # history → storage / analytics
Only active drivers are in the index; updates are O(1) cell moves in memory (Redis or a sharded location service) — the DB never sees per-ping writes.
Matching candidates by cell
def find_candidates(lat, lng):
cell = h3.geo_to_cell(lat, lng, RES)
cells = [cell] + h3.neighbors(cell, k=2) # nearby rings
drivers = []
for c in cells:
drivers += [d for d in geo_index.get(c) if status[d] == "available"]
return drivers
Lease-based dispatch (no double-matching)
Offer the best driver and lease them so concurrent requests can’t grab the same driver, and the driver isn’t offered two rides:
def dispatch(ride_request):
candidates = find_candidates(ride_request.lat, ride_request.lng)
ranked = sorted(candidates, key=lambda d: eta(d, ride_request)) # ETA, not raw distance
for driver in ranked:
if lease.acquire(f"driver:{driver}", ttl="15s"): # atomic claim
if offer(driver, ride_request, timeout="10s"): # driver accepts?
start_trip(ride_request, driver)
return driver
lease.release(f"driver:{driver}") # declined → next candidate
return None # no driver found → retry/surge
The lease (atomic, TTL’d — the distributed-lock pattern) guarantees one rider ↔ one driver even under concurrent dispatch.
The trip state machine (consistent)
Trip state is transactional and idempotent — never lose a trip or double-charge:
TRANSITIONS = {"requested": "matched", "matched": "driver_arriving",
"driver_arriving": "in_progress", "in_progress": "completed",
"completed": "paid"}
def advance(trip_id, event):
with txn(): # transactional update
trip = trips.get(trip_id)
expected = TRANSITIONS.get(trip.state)
if event.target == expected and not trip.applied(event.id): # idempotent
trip.state = event.target
trips.put(trip_id, trip)
if trip.state == "completed": payments.charge(trip) # exactly-once charge
Live tracking
Rider and driver apps subscribe (WebSocket/pub-sub) to trip updates; the driver’s location stream is forwarded to the matched rider so they see the car move in real time — reusing the location stream + a pub/sub channel keyed by trip.
ETA and surge
- ETA — call the routing service (road graph + live traffic, next lesson), cached per region; recompute as the trip moves.
- Surge — a job continuously computes
demand/supplyper cell from the request and location streams and publishes a multiplier the pricing service reads.
Scale and failure handling
- Location writes → in-memory, sharded by cell/region; history async to storage.
- Geo index HA → replicated; on node loss, drivers re-ping and repopulate quickly.
- Driver declines / times out → lease expires, next candidate offered.
- No drivers nearby → widen radius, queue the request, raise surge to attract supply.
- Trip/payment → transactional + idempotent so crashes don’t double-charge or lose trips.
- Region hotspot (event letting out) → finer cells + more matching workers; surge balances demand.
The takeaway
Concrete signals: an in-memory cell index updated per ping (DB spared the firehose), lease-based dispatch so no driver is double-matched, ETA-ranked matching, and a transactional idempotent trip state machine for correctness. Real-time moving-entity indexing + lease matching is the reusable core (DoorDash is the same shape with restaurants and couriers).