Microsoft.Extensions.Http.Resilience (Polly v8 under the hood) backs the kit’s outbound HTTP calls. The Web block ships an AddHeroResilience extension that attaches the standard resilience pipeline — retry with backoff on transient failures, a circuit breaker to fail fast when a downstream is consistently broken, and total + per-attempt timeouts to prevent hung requests from holding threadpool slots — to any IHttpClientBuilder, configured from one appsettings section.
How it’s wired
Resilience is opt-in per client, not global. The Webhooks module is the shipped example:
builder.Services.AddHttpClient("Webhooks") .AddHeroResilience(builder.Configuration);AddHeroResilience reads the HttpResilienceOptions section and calls AddStandardResilienceHandler with those values. When you add your own outbound integration, do the same — name the client, chain the extension.
What the standard pipeline does
[Outer] Total request timeout (30s default) ↓[Retry] Up to 3 attempts on transient errors with backoff ↓[Breaker] Circuit breaker — opens at 50% failure ratio, 5s break ↓[Attempt] Per-attempt timeout (10s default) ↓ HttpClient.SendAsyncConfiguration
{ "HttpResilienceOptions": { "Enabled": true, // default — false skips the handler entirely "MaxRetryAttempts": 3, "MedianFirstRetryDelay": "00:00:01", // first retry delay; backoff grows from here "TotalTimeout": "00:00:30", // whole request incl. retries "AttemptTimeout": "00:00:10", // each individual attempt "CircuitBreakerBreakDuration": "00:00:05", "CircuitBreakerFailureRatio": 0.5, "CircuitBreakerMinimumThroughput": 10 }}These are the code defaults — the shipped appsettings.json doesn’t carry the section, so you only add it when tuning. One section configures every client that opted in via AddHeroResilience; for genuinely different upstreams (a slow third-party API), register that client with its own AddStandardResilienceHandler(opts => ...) lambda instead.
How resilience integrates with [AutomaticRetry]
Hangfire’s [AutomaticRetry] and the HTTP resilience pipeline are two separate retry layers. The kit uses both, deliberately:
- HTTP-layer retry handles transient network blips (5xx, timeouts) within a single job attempt — fast, in-process, no job queue churn.
- Job-layer retry handles longer outages — minutes-to-hours-scale backoff while the Hangfire job stays in the queue.
For the Webhooks module: each delivery attempt runs through the Polly pipeline (up to 3 in-process retries, seconds apart). If the attempt still fails, the Hangfire job fails and [AutomaticRetry(Attempts = 4, DelaysInSeconds = [30, 120, 600, 3600])] re-runs it — up to 5 job-level attempts spread over ~1 h 12 min.
When NOT to retry
The standard resilience handler retries on transient failures only:
- HTTP 408 Request Timeout
- HTTP 429 Too Many Requests
- HTTP 500+ server errors
- Connection / timeout exceptions (
HttpRequestException, attempt timeouts)
It does not retry permanent client errors — 400, 401, 403, 404, 405, 409 — because retrying won’t change the answer. For specific 4xx codes that should be treated as transient (rare — e.g. an API that returns 423 Locked on transient contention), configure ShouldHandle with a custom predicate on a per-client AddStandardResilienceHandler registration.
Circuit breaker
The breaker opens when the failure ratio crosses CircuitBreakerFailureRatio (50%) over at least CircuitBreakerMinimumThroughput (10) requests. While open, requests fail fast with BrokenCircuitException instead of waiting for the timeout. After CircuitBreakerBreakDuration (5 s), the breaker half-opens — a trial request decides whether to close (resume normal traffic) or stay open (downstream still broken).
This is what stops a flaky downstream from cascading into your threadpool. Watch breaker open / close events in your traces; opens that don’t half-close mean the downstream is genuinely down and needs human attention.
Hedging (for read-only calls)
The kit doesn’t ship a hedging pipeline, but Microsoft.Extensions.Http.Resilience includes one if you need it. Hedging fires parallel requests; the first response wins, the rest are cancelled. Trades load for latency:
services.AddHttpClient("LatencySensitiveReads") .AddStandardHedgingHandler(opts => { opts.Hedging.MaxHedgedAttempts = 2; // up to 2 parallel requests opts.Hedging.Delay = TimeSpan.FromMilliseconds(200); // wait 200ms before firing the hedge });Don’t enable it globally — it multiplies the downstream’s load. Reserve it for high-percentile-sensitive reads against upstreams that can absorb the extra volume.
Gotchas
- Configure timeouts on the pipeline, not the client. The pipeline’s
TotalTimeoutis the effective bound; a mismatchedHttpClient.Timeoutjust adds a second, confusing limit. - Retry + non-idempotent requests = duplicate creates. If you POST and retry on 502, you might create two resources. Combine with Idempotency-Key on every retried POST.
- Look at
InnerException. Polly’s timeout/breaker exceptions wrap the underlying transport failure. - OpenTelemetry sees each attempt as its own HTTP client span. A request that succeeds on the third try shows three spans in the trace. This is correct; ignore the noise in dashboards.
Related
- Idempotency — required when retrying non-idempotent operations.
- Webhooks module — the shipped consumer.
- Background jobs — Hangfire
[AutomaticRetry]complements HTTP-layer retry.