Designing a notification service

Build a system that delivers push, SMS, and email to millions of users across third-party channels — reliably, without duplicates, and respecting preferences.

The problem

Design a notification/fan-out service: a central system other services call to send messages to users across multiple channels — mobile push (APNs/FCM), SMS, email, in-app. It must handle huge volume, integrate flaky third-party providers, dedupe, respect user preferences, and not get blocked by one slow channel.

Step 1 — Requirements

Functional: accept a “notify user U with content C via channels X”; deliver via the right provider per channel; respect user preferences (opt-outs, quiet hours); support templates and localization; track delivery status; handle high-volume fan-out (e.g. “notify all followers”).

Non-functional: scalability (millions/sec at peak), reliability (at-least-once delivery), low latency for urgent notifications, fault isolation (a dead email provider mustn’t stall push), and no spam (dedup, rate-limit per user).

Step 2 — The architecture (queue-centric)

This is a textbook async, queue-driven design:

services → Notification API → validate + apply preferences → enqueue
                                                                │
              ┌──────────────── Message queue (per channel) ────┤
              ▼                 ▼                  ▼
        push workers       SMS workers       email workers
              │                 │                  │
          APNs/FCM          Twilio            SES/SendGrid   (3rd-party providers)

API validates, looks up preferences and device tokens, renders the template, and enqueues per channel.
Per-channel queues + worker pools isolate failures and let each channel scale and rate-limit independently.
Workers call the provider, handle retries, and record status.

Step 3 — Data model

User prefs: user_id, channel, enabled, quiet_hours, locale
Devices:    user_id, device_token, platform (ios|android), active
Template:   id, channel, locale, body_template
Notification log: id, user_id, channel, status, provider_msg_id, ts

Step 4 — Reliability and dedup

At-least-once via durable queues + retries → workers/providers may duplicate, so dedup by an idempotency key (e.g. notification_id) before sending, and store sent ids.
Retries with backoff on provider errors; dead-letter after max attempts.
Status tracking via provider callbacks/webhooks (delivered, bounced, opened) updates the log.

Step 5 — Fan-out and rate control

Large fan-out (“notify 1M followers”) → don’t do it inline; enqueue a fan-out job that expands the audience in batches onto the per-channel queues (reuse the feed fan-out pattern).
Per-user rate limiting / batching → cap notifications per user per window and digest low-priority ones (Chapter 4 rate limiter), so users aren’t spammed.
Priority queues → urgent (2FA codes) jump ahead of marketing blasts.

Step 6 — Provider integration

Wrap each provider behind an adapter (Strategy pattern) so channels/providers swap freely; add a circuit breaker so a failing provider sheds load and fails fast instead of piling up; keep fallbacks (push fails → email).

Trade-offs to raise

At-least-once + dedup vs the cost of exactly-once.
Latency vs batching — urgent sent immediately; bulk batched for efficiency.
Provider coupling — abstract behind adapters; tolerate provider outages with circuit breakers and fallbacks.

The interview cue

“Notification API applies preferences/quiet-hours, renders templates, and enqueues per channel; per-channel worker pools call providers with retries + backoff + DLQ, dedup by idempotency key, circuit breakers for flaky providers, priority queues for urgent vs bulk, and a fan-out job for large audiences.” Queue-per-channel + dedup + preferences is the core; the worker and fan-out internals come next.