← Journal

How Banks and FinTechs Integrate with Payment APIs

The full integration lifecycle — partnership onboarding, IP whitelisting, payer-type classification, routing, reconciliation, and OTC agent networks — from inside the payments aisle of a tier-1 bank.

The README of a payment API is the easy half. The hard half is the integration lifecycle — the months between “we have a partner” and “we have a settled, reconciled transaction every two seconds”. This post walks that lifecycle in the order it actually happens.

1. Partnership onboarding

Before any code runs, three documents are signed:

  • Master Services Agreement — liability split, SLA, audit rights.
  • Technical Schedule — endpoints, message formats, retry policy, error matrix.
  • Operational Schedule — settlement windows, dispute SLAs, reconciliation cadence, support escalation.

The Technical Schedule is the one engineering owns. Two clauses we’ve learned to make non-negotiable:

  1. Either side may rotate credentials at any time with 24-hour notice.
  2. Idempotency is mandatory on all state-changing endpoints.

Without (1), credential leaks become legal incidents. Without (2), retries under load cause double-debits, and double-debits cause regulatory letters.

2. Whitelisting & connectivity

We don’t accept payment traffic from the public internet. Partners come in through one of three lanes:

  • MPLS for top-tier banks and the central bank. Predictable latency, expensive, slow to provision (4-8 weeks).
  • Site-to-site IPSec VPN for mid-tier banks and large fintechs. Fast to provision, reasonable latency.
  • Public mTLS over a hardened gateway for fintech startups. Requires a client certificate plus IP whitelist plus JWT — three layers, because losing any one is survivable.

The cost per lane matters more than people realise: an MPLS port is ~2,500/month,anIPSectunnelis 2,500/month, an IPSec tunnel is ~200/month, and mTLS is free. We let the lane match the partner’s traffic, not their prestige.

3. Payer-type classification

Not every payer is the same. We classify them at the edge so downstream systems can specialise:

Payer typeSourceLane
Bank customerInternet banking, mobile appInternal mTLS
MFS userbKash, Nagad, RocketMPLS or IPSec
OTC agentBank branch / agent terminalInternal MPLS
Card-on-fileVisa / Mastercard direct debitCard schemes
Wallet aggregatorSSL Commerz, AamarPay, ShurjoPayIPSec

Classification matters because of routing. A bank customer paying a utility bill goes through the core banking system’s posting engine — strict, slow, auditable. An MFS user goes through the wallet’s pre-funded float account — faster, but with weaker dispute rights. Both look like POST /bill/pay from the partner’s side, and that’s the point. We absorb the complexity.

4. Routing

Routing rules are owned by ops, not engineering. The actual rule table is a few hundred rows long; the structure is small:

(bill_type, payer_type, partner_id) → biller_endpoint, settlement_account

When a partner pings POST /bill/pay, we look up the (bill_type, payer_type, partner_id) tuple, pick the right biller endpoint, attach the right settlement account, and only then forward the call. The lookup is in a Redis cache; cache miss falls back to a SQL Server table; cache write-through is on every ops update.

A small generalisation: every routing decision has a TTL. We expire the cache aggressively (60 seconds) because ops will flip a routing rule mid-day during a partner-side outage and the system must pick it up immediately.

5. Reconciliation

Reconciliation is where engineers and accountants meet. The cycle for a single day looks like this:

  1. T+0 23:55 — partner posts their settlement file (CSV or XML).
  2. T+1 00:30 — our system fetches the biller’s posting file.
  3. T+1 01:00 — match script runs three-way: partner transactions vs. our ledger vs. biller postings.
  4. T+1 02:00 — break list published to ops dashboard.
  5. T+1 09:00 — ops works the breaks; majority resolve within 4 hours.

A break is any transaction where at least one of the three sources disagrees. The taxonomy we use:

Break codeMeaningOwner
B1In partner, missing in our ledgerOps
B2In our ledger, missing at billerEng
B3Amount mismatchPartner ops
B4Status mismatch (we have OK, biller has FAIL)Eng
B5Currency or date mismatchOps

Engineering owns B2 and B4 — both are usually our timeout/retry mistakes. B1, B3, B5 are partner ops issues.

Average daily break count (last 90 days)
18
B1 In partner only
6
B2 Missing at biller
12
B3 Amount mismatch
3
B4 Status mismatch
4
B5 Currency / date
B1 dominates because partners often send the next day's file containing rows that haven't reached us yet. They auto-resolve at T+2.

6. OTC agent networks

Over-the-counter is the messy bit. Agents are small shops, kiosks, and bank branches scattered across regions; they accept cash and post a payment on the customer’s behalf. The complications:

  • Cash float reconciliation is daily and physical — an actual person counts cash and submits a sheet.
  • Connectivity is unreliable — many agents are on 3G; we cache the agent’s last-known bill lookups locally on their terminal and sync on reconnect.
  • Trust is bounded — every agent has a daily cash limit; the system refuses postings beyond that limit until the agent settles.

The OTC flow looks almost identical to a normal bank-customer flow at the biller end, but at the bank end it goes through a separate agent banking core. We hide all of that behind the same POST /bill/pay.

What goes wrong (and what to invest in)

Three categories soak up most engineering time:

  • Partner-side timeouts. A partner whose code hangs on our slow path retries after 30 s, and now we have two postings to reconcile. The fix is partner-side: shorter timeouts and idempotency keys. We push hard for both during onboarding.
  • Biller-side reversals. A biller decides 24 hours later that a payment was invalid and reverses it. Our ledger now has a “reversed” row but the partner doesn’t. We push reversal webhooks; partners that don’t subscribe see breaks the next day.
  • Routing changes. Ops flips a routing rule and forgets to write to the audit log. When something breaks, we can’t tell what changed. We made the audit log non-negotiable in code.

Integration work is a long tail of small contracts, not a heroic architecture. The teams that ship are the ones that treat onboarding as a product, with its own backlog, its own metrics, and its own quarterly OKRs — not as a sales-engineering afterthought.