Skip to content
fullstackhero

Concept

HTTP resilience

Polly v8 via Microsoft.Extensions.Http.Resilience — retry, circuit breaker, timeouts — opt-in per HttpClient with the kit's AddHeroResilience extension.

views 0 Last updated

Microsoft.Extensions.Http.Resilience (Polly v8 under the hood) backs the kit’s outbound HTTP calls. The Web block ships an AddHeroResilience extension that attaches the standard resilience pipeline — retry with backoff on transient failures, a circuit breaker to fail fast when a downstream is consistently broken, and total + per-attempt timeouts to prevent hung requests from holding threadpool slots — to any IHttpClientBuilder, configured from one appsettings section.

How it’s wired

Resilience is opt-in per client, not global. The Webhooks module is the shipped example:

WebhooksModule.ConfigureServices
builder.Services.AddHttpClient("Webhooks")
.AddHeroResilience(builder.Configuration);

AddHeroResilience reads the HttpResilienceOptions section and calls AddStandardResilienceHandler with those values. When you add your own outbound integration, do the same — name the client, chain the extension.

What the standard pipeline does

[Outer] Total request timeout (30s default)
[Retry] Up to 3 attempts on transient errors with backoff
[Breaker] Circuit breaker — opens at 50% failure ratio, 5s break
[Attempt] Per-attempt timeout (10s default)
HttpClient.SendAsync

Configuration

{
"HttpResilienceOptions": {
"Enabled": true, // default — false skips the handler entirely
"MaxRetryAttempts": 3,
"MedianFirstRetryDelay": "00:00:01", // first retry delay; backoff grows from here
"TotalTimeout": "00:00:30", // whole request incl. retries
"AttemptTimeout": "00:00:10", // each individual attempt
"CircuitBreakerBreakDuration": "00:00:05",
"CircuitBreakerFailureRatio": 0.5,
"CircuitBreakerMinimumThroughput": 10
}
}

These are the code defaults — the shipped appsettings.json doesn’t carry the section, so you only add it when tuning. One section configures every client that opted in via AddHeroResilience; for genuinely different upstreams (a slow third-party API), register that client with its own AddStandardResilienceHandler(opts => ...) lambda instead.

How resilience integrates with [AutomaticRetry]

Hangfire’s [AutomaticRetry] and the HTTP resilience pipeline are two separate retry layers. The kit uses both, deliberately:

  • HTTP-layer retry handles transient network blips (5xx, timeouts) within a single job attempt — fast, in-process, no job queue churn.
  • Job-layer retry handles longer outages — minutes-to-hours-scale backoff while the Hangfire job stays in the queue.

For the Webhooks module: each delivery attempt runs through the Polly pipeline (up to 3 in-process retries, seconds apart). If the attempt still fails, the Hangfire job fails and [AutomaticRetry(Attempts = 4, DelaysInSeconds = [30, 120, 600, 3600])] re-runs it — up to 5 job-level attempts spread over ~1 h 12 min.

When NOT to retry

The standard resilience handler retries on transient failures only:

  • HTTP 408 Request Timeout
  • HTTP 429 Too Many Requests
  • HTTP 500+ server errors
  • Connection / timeout exceptions (HttpRequestException, attempt timeouts)

It does not retry permanent client errors — 400, 401, 403, 404, 405, 409 — because retrying won’t change the answer. For specific 4xx codes that should be treated as transient (rare — e.g. an API that returns 423 Locked on transient contention), configure ShouldHandle with a custom predicate on a per-client AddStandardResilienceHandler registration.

Circuit breaker

The breaker opens when the failure ratio crosses CircuitBreakerFailureRatio (50%) over at least CircuitBreakerMinimumThroughput (10) requests. While open, requests fail fast with BrokenCircuitException instead of waiting for the timeout. After CircuitBreakerBreakDuration (5 s), the breaker half-opens — a trial request decides whether to close (resume normal traffic) or stay open (downstream still broken).

This is what stops a flaky downstream from cascading into your threadpool. Watch breaker open / close events in your traces; opens that don’t half-close mean the downstream is genuinely down and needs human attention.

Hedging (for read-only calls)

The kit doesn’t ship a hedging pipeline, but Microsoft.Extensions.Http.Resilience includes one if you need it. Hedging fires parallel requests; the first response wins, the rest are cancelled. Trades load for latency:

services.AddHttpClient("LatencySensitiveReads")
.AddStandardHedgingHandler(opts =>
{
opts.Hedging.MaxHedgedAttempts = 2; // up to 2 parallel requests
opts.Hedging.Delay = TimeSpan.FromMilliseconds(200); // wait 200ms before firing the hedge
});

Don’t enable it globally — it multiplies the downstream’s load. Reserve it for high-percentile-sensitive reads against upstreams that can absorb the extra volume.

Gotchas

  • Configure timeouts on the pipeline, not the client. The pipeline’s TotalTimeout is the effective bound; a mismatched HttpClient.Timeout just adds a second, confusing limit.
  • Retry + non-idempotent requests = duplicate creates. If you POST and retry on 502, you might create two resources. Combine with Idempotency-Key on every retried POST.
  • Look at InnerException. Polly’s timeout/breaker exceptions wrap the underlying transport failure.
  • OpenTelemetry sees each attempt as its own HTTP client span. A request that succeeds on the third try shows three spans in the trace. This is correct; ignore the noise in dashboards.