Designing UPI (Unified Payments Interface)

Real-time payments between any two banks — virtual payment addresses, the central switch routing a two-leg debit/credit, and reversal on failure.

The problem

Design UPI (India’s real-time payment rails, similar to other instant-payment systems): let anyone instantly pay anyone across different banks using a simple virtual address — money debited from the payer’s bank and credited to the payee’s bank in seconds, 24/7. The crux is coordinating a two-bank transfer reliably with failure reversal.

Step 1 — Requirements

Functional: link a bank account to a VPA (virtual payment address, e.g. alice@bank); pay by VPA/QR; request money; instant inter-bank settlement; balance check.

Non-functional: real-time (seconds), 24/7, extremely reliable (no money lost or double-moved across banks), strong consistency for the transfer, massive scale (billions of transactions/month), available.

Step 2 — The players

Payer app  ↔  Payer's bank (PSP)        ┐
                                         ├─ UPI Switch (central, e.g. NPCI)
Payee app  ↔  Payee's bank (PSP)        ┘

VPA decouples identity from bank account numbers — a directory maps VPA → bank + account (resolved at pay time; the account number is never exposed).
The central switch routes and coordinates between the payer’s and payee’s banks. It’s the orchestrator that makes “any bank to any bank” work without N² integrations.

Step 3 — The transfer flow (two legs, coordinated)

A payment is fundamentally a debit at one bank + credit at another — across systems the switch must coordinate, with a reference id tying the legs:

1. resolve VPA → payee bank/account
2. switch → payer bank: DEBIT payer (with a unique txn ref)
3. on debit success → switch → payee bank: CREDIT payee (same ref)
4. both succeed → SUCCESS to both apps
5. credit fails after debit succeeded → REVERSE the debit (refund payer)

This is a distributed transaction across independent banks. Since a true 2-phase commit across banks is impractical, it’s a saga: debit, then credit, compensate (reverse) on failure — with idempotency and a definitive reconciliation path.

Step 4 — Idempotency and reversal (correctness core)

Idempotency — every leg carries the unique transaction reference; retries (common, given timeouts) must not double-debit or double-credit. Each bank dedups on the ref.
Timeouts & reversal — networks between banks time out. If the switch is unsure whether a debit happened, it must resolve definitively (status check) and reverse if needed — never leave money missing. Auto-reversal on failed/uncertain legs is essential.
Reconciliation — banks and the switch reconcile transaction logs continuously to catch and repair any stuck/mismatched transfers.

Step 5 — Security

Strong auth (device binding + UPI PIN), encryption end-to-end, and the VPA indirection (no raw account numbers shared). Mention it — it’s a payments system.

Step 6 — Scale

The switch is a high-throughput, highly-available router (stateless coordination + durable transaction log), horizontally scaled and multi-region; banks handle their own ledgers (the digital-wallet/ledger pattern internally).

Trade-offs to raise

Saga (debit→credit→reverse) over true 2PC across banks — 2PC across independent institutions is impractical; saga + reversal + reconciliation is the realistic model.
Strong consistency (CP) — never lose/duplicate money; prefer reversal/pending over ambiguity.
Latency vs certainty — confirm legs before declaring success; reverse on doubt.

The interview cue

“VPA decouples identity from account; a central switch coordinates a two-leg debit-then-credit across the payer’s and payee’s banks as a saga with a unique txn reference; idempotency prevents double-moves on retries, timeouts trigger reversal, and continuous reconciliation repairs stragglers. CP — correctness over availability.” Switch-coordinated two-leg transfer + idempotency + reversal is the answer; implementation next.