Building a metrics & logging platform
Implement the batched ingestion pipeline, time-series compression, rollup downsampling, and approximate percentile queries.
Ingestion: agents → queue → processors
Agents batch locally and push; a queue absorbs bursts; processors write to storage:
# agent (on each host): batch points, flush periodically
def agent_flush(buffer):
collector.send(compress(buffer)) # batched, compressed over the wire
# collector → queue
def on_receive(batch):
for point in decompress(batch):
kafka.publish("metrics", key=point.series_id, value=point) # partition by series
# stream processor
def process():
for point in kafka.consume("metrics"):
enrich(point) # add tags/host metadata
tsdb.append(point) # write to the time-series store
rollup_aggregator.update(point) # feed live rollups
Batching + the queue is what sustains millions of points/sec and survives spikes (back- pressure, replay) without dropping data.
Time-series storage and compression
A data point is (series_id, timestamp, value) where series_id = hash(metric, tags).
Store per-series, time-ordered, and compress hard:
- delta-of-delta encode timestamps (regular intervals compress to ~bits)
- XOR-compress consecutive float values (Gorilla-style)
- partition by time window (e.g. 2h blocks) + series → queries scan only relevant blocks
This gets ~1–2 bytes per point and makes “series X over range T” a sequential block read. Logs instead go to an inverted index (parse fields → index) for text/field search.
Rollups (downsampling)
Pre-aggregate raw points into coarser intervals so long-range queries are cheap:
def rollup_aggregator(): # continuous, per series + interval
for window in tumbling(raw_stream, sizes=["1m", "1h", "1d"]):
agg = {"sum": sum(window.values), "count": len(window.values),
"min": min(window.values), "max": max(window.values),
"p99_sketch": tdigest(window.values)} # mergeable percentile sketch
rollup_store.write(window.series, window.size, window.end, agg)
A “CPU over the last year” query reads 1d rollups (365 points), not billions of raw points. Retention tiers: raw for days, 1m for weeks, 1h/1d for years, then expire/archive.
Querying with the right resolution + approximation
The query engine picks the coarsest rollup that satisfies the time range, then merges:
def query(metric, tags, start, end, agg):
res = choose_resolution(end - start) # long range → coarse rollup
series = storage.read(series_id(metric, tags), start, end, res)
if agg == "p99":
return merge_tdigests(s.p99_sketch for s in series).quantile(0.99) # approximate
return reduce(agg, series)
Percentiles use mergeable sketches (t-digest) and cardinality uses HyperLogLog — exact would be prohibitively expensive at this volume. Cache hot dashboard queries.
Alerting (streaming evaluation)
Evaluate rules against the live stream/recent rollups; dedupe and group before paging:
def evaluate_alerts():
for rule in alert_rules: # e.g. p99 latency > 500ms for 5m
if rule.fires(recent(rule.metric, rule.window)):
key = rule.dedupe_key()
if alert_state.transition_to_firing(key): # only on state change
notify_grouped(rule, severity=rule.severity) # notification service
State transitions (ok→firing→resolved) and grouping prevent alert storms (one outage = one page, not thousands).
Scale and failure handling
- Ingestion spike → queue buffers; processors autoscale on lag; agents batch.
- Processor crash → resume from the Kafka offset (at-least-once; writes idempotent by series+timestamp).
- Storage growth → rollups + retention tiers + compression keep cost bounded.
- High-cardinality explosion (too many tag combinations) → the real failure mode; cap/ drop high-cardinality tags, alert on cardinality.
- Some loss tolerable → sampling logs, approximate metrics — acceptable for observability.
The takeaway
Concrete signals: batched queue-buffered ingestion (write-heavy), compressed time- series storage partitioned by time+series, rollups + retention tiers so range queries read aggregates, and approximate sketches (t-digest/HLL) for percentiles/cardinality, with stateful grouped alerting. Write-optimized pipeline + precomputed rollups is the analytics counterpart to the read-heavy fan-out feeds — and the capstone of Chapter 4.