Building a Dispute Automation System in .NET 9
A production-grade .NET 9 Web API for utility-bill disputes — layered architecture, repository + service-invoker patterns, two-level logging, and a step-by-step partner onboarding flow.
A bill dispute sounds prosaic until you trace one. A customer claims they paid; our ledger says they didn’t; the biller says we did pay them; the partner says the customer never asked. Resolving that takes a person 20 minutes today; the goal of this system was to take it to under 2 minutes across the cases that don’t actually need a human.
This is the architecture we shipped on .NET 9, the patterns we picked, and the parts where I’d choose differently next time.
The shape of the API
POST /api/v1/disputes → create
GET /api/v1/disputes/{id} → fetch one
GET /api/v1/disputes?status=open → query
POST /api/v1/disputes/{id}/resolve → close (or escalate)
POST /api/v1/disputes/{id}/refund → ledger refund
REST is enough. We considered gRPC for the partner-side fan-out; we rejected it because the partners we onboard most often (mid-tier banks) have HTTP/1.1 stacks and no appetite for a new transport.
Layering
A boring five-layer cake, with one twist (the service invoker):
Controllers → HTTP surface, validation, response shaping
↓
Services → orchestration, business rules
↓ (via ServiceInvoker)
External Connectors → biller APIs, ledger system, notification queue
↓
Repositories → data access (Dapper over SQL Server)
↓
Domain models → POCOs, value objects
Controllers are kept dumb. They:
- Validate the request (FluentValidation).
- Resolve the right service.
- Wrap the call in a
try/catchthat the global exception filter handles. - Map domain results to the canonical response envelope.
They do not know about repositories, connectors, or transactions.
The Service Invoker
The non-obvious pattern in this design is the service invoker. Every external call goes through it:
public sealed class ServiceInvoker : IServiceInvoker
{
private readonly ILogger<ServiceInvoker> _log;
private readonly IHttpClientFactory _http;
private readonly IOptions<DownstreamOptions> _opts;
public async Task<Result<TResp>> Invoke<TReq, TResp>(
string endpointName,
TReq payload,
CancellationToken ct = default)
{
var ep = _opts.Value.Endpoints[endpointName];
using var client = _http.CreateClient(ep.HttpClientName);
var traceId = Activity.Current?.TraceId.ToString() ?? Guid.NewGuid().ToString("N");
_log.LogInformation("Invoke {Endpoint} trace={Trace} payload={Payload}",
endpointName, traceId, payload);
var sw = Stopwatch.StartNew();
try
{
var resp = await client.PostAsJsonAsync(ep.Path, payload, ct);
var body = await resp.Content.ReadFromJsonAsync<TResp>(cancellationToken: ct);
_log.LogInformation("Invoke {Endpoint} status={Status} ms={Ms}",
endpointName, (int)resp.StatusCode, sw.ElapsedMilliseconds);
return resp.IsSuccessStatusCode
? Result<TResp>.Ok(body!)
: Result<TResp>.Fail($"{endpointName} returned {(int)resp.StatusCode}");
}
catch (Exception ex)
{
_log.LogError(ex, "Invoke {Endpoint} threw after {Ms}ms",
endpointName, sw.ElapsedMilliseconds);
return Result<TResp>.Fail(ex.Message);
}
}
}
Three benefits:
- One place to log every downstream call. The auditors love this.
- One place to configure timeouts, retries and headers per partner. No more sprinkling Polly policies across the codebase.
- Easy unit tests. Services depend on
IServiceInvoker, which is trivial to mock.
Repository pattern, lightly
We use the repository pattern, but only as a seam, not as a deep abstraction. Repositories are thin Dapper wrappers; we don’t try to make them database-agnostic. The goal is testability, not portability.
public interface IDisputeRepository
{
Task<Dispute?> GetAsync(Guid id, CancellationToken ct);
Task<Guid> InsertAsync(Dispute d, CancellationToken ct);
Task<int> UpdateStatusAsync(Guid id, DisputeStatus s, CancellationToken ct);
Task<IReadOnlyList<Dispute>> QueryAsync(DisputeQuery q, CancellationToken ct);
}
The one rule we enforce: no SQL outside repositories. If a service needs data, it asks a repository. If the query is too specific to fit a generic repository method, we add a new method on the repository — we do not let callers pass arbitrary SQL fragments. This rule, more than any other, has kept the project’s blast radius small.
Two-level logging
Every request gets two log surfaces:
- Audit log — to SQL, indexed on (partner, customer, dispute_id). One row per state transition. Compliance reads this.
- Application log — to a JSON file, shipped to Loki. Engineers read this.
Both share a traceId. The audit log is intentionally low-cardinality;
the app log is intentionally high-cardinality. Mixing the two is the
fastest way to ruin both.
Configuration
appsettings.json holds non-secret toggles; secrets live in
Azure Key Vault, mounted at startup. Per-environment overrides are
appsettings.Production.json and an IConfigurationSource that pulls
partner-specific endpoints from a SQL table at boot.
The partner-endpoint table looks like this:
| partner_id | endpoint_name | url | timeout_ms | retry_count |
|---|---|---|---|---|
| BANK_A | bill-lookup | https://core-a/billing/lookup | 1500 | 2 |
| BANK_A | bill-refund | https://core-a/billing/refund | 4000 | 0 |
| BANK_B | bill-lookup | https://core-b/api/v2/lookup | 2000 | 1 |
When ops onboards a new partner, they add rows here. No deploy.
Onboarding a new client (the punchlist)
This is the runbook ops follows; engineering provides one new SQL row and two config entries:
- Ops signs the Technical Schedule with the partner.
- Engineering adds the partner to the
partnerstable (id, name, public_key). - Engineering adds endpoint rows to
partner_endpoints. - Engineering issues an mTLS client certificate via Key Vault.
- Ops runs the
/healthz/partner/{partner_id}smoke probe. - Ops flips the partner to
active = true.
The smoke probe is critical: it exercises every endpoint with a synthetic zero-amount transaction. If any endpoint fails, the partner stays inactive until it doesn’t.
Throughput
Pre-rewrite (a PHP monolith), the system handled ~40 disputes per minute under sustained load before queue depth blew up. The .NET 9 rewrite hit the following profile on a single 4-core pod:
What I’d change
- Source-generate the canonical envelope. We hand-wrote it. A Roslyn source generator that produces the response classes from a single schema would have saved a hundred small edits.
- Move audit writes off the request path. We block the response on the audit log write. It’s fast, but a queue with redelivery on failure would be more honest about the durability requirement.
- Adopt OpenTelemetry from the start. We bolted it on six months in. The instrumentation density is uneven, and we keep finding code paths that silently break the trace.
The biggest lesson is unsexy: the patterns that paid off (ServiceInvoker,
thin repositories, the partner-endpoint table) are all about single
points of variation. Anywhere a partner could be different from another
partner — timeouts, headers, signing — we forced into a single, declarative
table or a single class. That’s what makes onboarding a new partner cheap;
it’s the only metric the business cares about.