Skip to content
System design course
Ch.4 · Designing real systems·how to build it ·6 min read

Building UPI (Unified Payments Interface)

Implement the switch's debit-then-credit saga with a unique reference, idempotent bank legs, and timeout-driven reversal.


VPA resolution

A payment starts by resolving the payee’s virtual address to a bank/account (the account number is never exposed to the payer):

def resolve(vpa):
    rec = vpa_directory.get(vpa)                      # alice@bank → {bank, account_token}
    if not rec: raise InvalidVPA()
    return rec

The switch saga: debit, then credit, reverse on failure

The switch coordinates the two banks with a single transaction reference and durable state, so it can resume/resolve after any timeout:

def pay(payer_vpa, payee_vpa, amount, idempotency_key):
    ref = txn_log.create(ref=uuid(), payer=payer_vpa, payee=payee_vpa,
                         amount=amount, state="initiated", key=idempotency_key)
    payee = resolve(payee_vpa); payer = resolve(payer_vpa)

    debit = bank_call(payer.bank, "DEBIT", ref, payer.account, amount)   # leg 1
    if debit.status != "SUCCESS":
        txn_log.set(ref, "failed"); return FAILED
    txn_log.set(ref, "debited")

    credit = bank_call(payee.bank, "CREDIT", ref, payee.account, amount) # leg 2
    if credit.status == "SUCCESS":
        txn_log.set(ref, "success"); return SUCCESS
    else:
        reverse_debit(ref, payer, amount)             # compensate: refund the payer
        txn_log.set(ref, "reversed"); return FAILED

The durable txn_log (keyed by ref) is what lets the switch recover: after a crash or timeout it knows whether the debit happened and what to do.

Idempotent bank legs

Each bank dedups on the ref so retries (frequent, due to timeouts) never double-move:

def bank_handle(op, ref, account, amount):            # inside a bank
    if ledger.applied(ref, op):                       # already did this leg?
        return ledger.result(ref, op)                 # return the same outcome
    with txn():
        if op == "DEBIT":
            ok = conditional_debit(account, amount)    # no overdraft (wallet pattern)
            result = "SUCCESS" if ok else "INSUFFICIENT"
        else:  # CREDIT
            credit(account, amount); result = "SUCCESS"
        post_double_entry(ref, op, account, amount)    # the bank's own ledger
        ledger.mark_applied(ref, op, result)
    return result

Timeout-driven reversal

The dangerous case: the switch debits the payer but doesn’t hear back about the credit (timeout). It must resolve definitively — query status, then credit or reverse — never leave money missing:

def resolve_pending():                                # background reconciler
    for tx in txn_log.stuck(state="debited", older_than="30s"):
        status = bank_call(payee_bank(tx), "STATUS", tx.ref)
        if status == "SUCCESS": txn_log.set(tx.ref, "success")
        else:
            reverse_debit(tx.ref, payer(tx), tx.amount)   # idempotent refund
            txn_log.set(tx.ref, "reversed")

reverse_debit itself carries the ref and is idempotent, so re-running the reconciler is safe.

Reconciliation

The switch and each bank periodically compare transaction logs by ref; any mismatch (debited-but-not-credited, duplicate) is flagged and repaired (credit the missing leg or reverse). This is the ultimate correctness guarantee across independent institutions.

Scale and failure handling

  • Switch → stateless coordination + durable txn log, horizontally scaled, multi-region, HA (it’s the critical path for the whole network).
  • Retries/timeouts → idempotency by ref everywhere; reconciler resolves stuck txns.
  • Debit ok, credit fails → automatic reversal (payer refunded).
  • Bank down → leg fails/timeouts → reverse or retry; reconcile.
  • CP → never declare success until both legs confirmed; reverse on doubt.

The takeaway

Concrete signals: a switch-coordinated debit→credit saga keyed by a unique reference, idempotent bank legs (dedup on ref, no double-move), timeout-driven reversal, and reconciliation across institutions. It’s the digital-wallet ledger extended to a cross-bank distributed transaction — saga + idempotency + reversal is how money moves correctly between systems you don’t control.