Designing UPI (Unified Payments Interface)
Real-time payments between any two banks — virtual payment addresses, the central switch routing a two-leg debit/credit, and reversal on failure.
The problem
Design UPI (India’s real-time payment rails, similar to other instant-payment systems): let anyone instantly pay anyone across different banks using a simple virtual address — money debited from the payer’s bank and credited to the payee’s bank in seconds, 24/7. The crux is coordinating a two-bank transfer reliably with failure reversal.
Step 1 — Requirements
Functional: link a bank account to a VPA (virtual payment address, e.g.
alice@bank); pay by VPA/QR; request money; instant inter-bank settlement;
balance check.
Non-functional: real-time (seconds), 24/7, extremely reliable (no money lost or double-moved across banks), strong consistency for the transfer, massive scale (billions of transactions/month), available.
Step 2 — The players
Payer app ↔ Payer's bank (PSP) ┐
├─ UPI Switch (central, e.g. NPCI)
Payee app ↔ Payee's bank (PSP) ┘
- VPA decouples identity from bank account numbers — a directory maps VPA → bank + account (resolved at pay time; the account number is never exposed).
- The central switch routes and coordinates between the payer’s and payee’s banks. It’s the orchestrator that makes “any bank to any bank” work without N² integrations.
Step 3 — The transfer flow (two legs, coordinated)
A payment is fundamentally a debit at one bank + credit at another — across systems the switch must coordinate, with a reference id tying the legs:
1. resolve VPA → payee bank/account
2. switch → payer bank: DEBIT payer (with a unique txn ref)
3. on debit success → switch → payee bank: CREDIT payee (same ref)
4. both succeed → SUCCESS to both apps
5. credit fails after debit succeeded → REVERSE the debit (refund payer)
This is a distributed transaction across independent banks. Since a true 2-phase commit across banks is impractical, it’s a saga: debit, then credit, compensate (reverse) on failure — with idempotency and a definitive reconciliation path.
Step 4 — Idempotency and reversal (correctness core)
- Idempotency — every leg carries the unique transaction reference; retries (common, given timeouts) must not double-debit or double-credit. Each bank dedups on the ref.
- Timeouts & reversal — networks between banks time out. If the switch is unsure whether a debit happened, it must resolve definitively (status check) and reverse if needed — never leave money missing. Auto-reversal on failed/uncertain legs is essential.
- Reconciliation — banks and the switch reconcile transaction logs continuously to catch and repair any stuck/mismatched transfers.
Step 5 — Security
Strong auth (device binding + UPI PIN), encryption end-to-end, and the VPA indirection (no raw account numbers shared). Mention it — it’s a payments system.
Step 6 — Scale
The switch is a high-throughput, highly-available router (stateless coordination + durable transaction log), horizontally scaled and multi-region; banks handle their own ledgers (the digital-wallet/ledger pattern internally).
Trade-offs to raise
- Saga (debit→credit→reverse) over true 2PC across banks — 2PC across independent institutions is impractical; saga + reversal + reconciliation is the realistic model.
- Strong consistency (CP) — never lose/duplicate money; prefer reversal/pending over ambiguity.
- Latency vs certainty — confirm legs before declaring success; reverse on doubt.
The interview cue
“VPA decouples identity from account; a central switch coordinates a two-leg debit-then-credit across the payer’s and payee’s banks as a saga with a unique txn reference; idempotency prevents double-moves on retries, timeouts trigger reversal, and continuous reconciliation repairs stragglers. CP — correctness over availability.” Switch-coordinated two-leg transfer + idempotency + reversal is the answer; implementation next.