Designing a notification service
Build a system that delivers push, SMS, and email to millions of users across third-party channels — reliably, without duplicates, and respecting preferences.
The problem
Design a notification/fan-out service: a central system other services call to send messages to users across multiple channels — mobile push (APNs/FCM), SMS, email, in-app. It must handle huge volume, integrate flaky third-party providers, dedupe, respect user preferences, and not get blocked by one slow channel.
Step 1 — Requirements
Functional: accept a “notify user U with content C via channels X”; deliver via the right provider per channel; respect user preferences (opt-outs, quiet hours); support templates and localization; track delivery status; handle high-volume fan-out (e.g. “notify all followers”).
Non-functional: scalability (millions/sec at peak), reliability (at-least-once delivery), low latency for urgent notifications, fault isolation (a dead email provider mustn’t stall push), and no spam (dedup, rate-limit per user).
Step 2 — The architecture (queue-centric)
This is a textbook async, queue-driven design:
services → Notification API → validate + apply preferences → enqueue
│
┌──────────────── Message queue (per channel) ────┤
▼ ▼ ▼
push workers SMS workers email workers
│ │ │
APNs/FCM Twilio SES/SendGrid (3rd-party providers)
- API validates, looks up preferences and device tokens, renders the template, and enqueues per channel.
- Per-channel queues + worker pools isolate failures and let each channel scale and rate-limit independently.
- Workers call the provider, handle retries, and record status.
Step 3 — Data model
User prefs: user_id, channel, enabled, quiet_hours, locale
Devices: user_id, device_token, platform (ios|android), active
Template: id, channel, locale, body_template
Notification log: id, user_id, channel, status, provider_msg_id, ts
Step 4 — Reliability and dedup
- At-least-once via durable queues + retries → workers/providers may duplicate,
so dedup by an idempotency key (e.g.
notification_id) before sending, and store sent ids. - Retries with backoff on provider errors; dead-letter after max attempts.
- Status tracking via provider callbacks/webhooks (delivered, bounced, opened) updates the log.
Step 5 — Fan-out and rate control
- Large fan-out (“notify 1M followers”) → don’t do it inline; enqueue a fan-out job that expands the audience in batches onto the per-channel queues (reuse the feed fan-out pattern).
- Per-user rate limiting / batching → cap notifications per user per window and digest low-priority ones (Chapter 4 rate limiter), so users aren’t spammed.
- Priority queues → urgent (2FA codes) jump ahead of marketing blasts.
Step 6 — Provider integration
Wrap each provider behind an adapter (Strategy pattern) so channels/providers swap freely; add a circuit breaker so a failing provider sheds load and fails fast instead of piling up; keep fallbacks (push fails → email).
Trade-offs to raise
- At-least-once + dedup vs the cost of exactly-once.
- Latency vs batching — urgent sent immediately; bulk batched for efficiency.
- Provider coupling — abstract behind adapters; tolerate provider outages with circuit breakers and fallbacks.
The interview cue
“Notification API applies preferences/quiet-hours, renders templates, and enqueues per channel; per-channel worker pools call providers with retries + backoff + DLQ, dedup by idempotency key, circuit breakers for flaky providers, priority queues for urgent vs bulk, and a fan-out job for large audiences.” Queue-per-channel + dedup + preferences is the core; the worker and fan-out internals come next.