Microsoft.Extensions.Http.Resilience (Polly v8 under the hood) is wired into every named HttpClient the kit ships. The default pipeline adds retry with exponential backoff on transient failures (5xx, 408, 429), a circuit breaker to fail fast when a downstream is consistently broken, an attempt timeout to prevent hung requests from holding threadpool slots, and an optional hedging strategy for read-only calls where latency variance matters more than load.
What the standard pipeline does
AddHeroPlatform calls AddStandardResilienceHandler on every named client. That adds the following pipeline:
[Outer] Total request timeout (e.g. 30s) ↓[Retry] Up to 3 attempts on transient errors with exponential backoff + jitter ↓[Breaker] Circuit breaker — opens after N consecutive failures, half-opens after cooldown ↓[Attempt] Per-attempt timeout (e.g. 10s) ↓ HttpClient.SendAsyncEach strategy is configurable individually; the defaults are tuned for the typical “internal API to internal API” call shape.
Named clients in the kit
The kit ships these named clients out of the box:
| Name | Where | Used by |
|---|---|---|
Webhooks | WebhooksModule.ConfigureServices | Outbound webhook deliveries |
Default | Implicit via IHttpClientFactory | Anywhere IHttpClientFactory.CreateClient() is called without a name |
The Webhooks module is the most visible consumer — WebhookDispatchJob POSTs payloads via the Webhooks client. The pipeline’s retry strategy is what gives webhook delivery its “retry transient failures, give up on permanent ones” behaviour.
How resilience integrates with [AutomaticRetry]
Hangfire’s [AutomaticRetry(Attempts = 4, DelaysInSeconds = ...)] and the HTTP resilience pipeline are two separate retry layers. The kit uses both, deliberately:
- HTTP-layer retry handles transient network issues (5xx, timeouts) within a single attempt — fast, in-process, no job queue churn.
- Job-layer retry handles longer-term outages — minutes-to-hours-scale backoff while the Hangfire job stays in the queue.
For the Webhooks module:
- 1 HTTP-layer retry runs the configured Polly pipeline (small backoff in seconds).
- If that exhausts, the Hangfire job fails →
[AutomaticRetry]kicks in with 30 s / 2 m / 10 m / 1 h delays.
Together, a single webhook delivery has up to 5 attempts spread over ~1 h 12 min.
Tuning the pipeline
The defaults are sensible. When you need to deviate, configure per-named-client:
services.AddHttpClient("SlowExternalApi", client =>{ client.BaseAddress = new Uri("https://slow.example.com/"); client.Timeout = TimeSpan.FromMinutes(2);}).AddStandardResilienceHandler(opts =>{ opts.AttemptTimeout.Timeout = TimeSpan.FromSeconds(45); opts.TotalRequestTimeout.Timeout = TimeSpan.FromMinutes(3); opts.Retry.MaxRetryAttempts = 5; opts.Retry.BackoffType = DelayBackoffType.Exponential; opts.Retry.UseJitter = true;});Don’t override globally — the standard defaults are right for the typical case. Customise per-client where the upstream demands it.
When NOT to retry
The resilience pipeline retries on transient failures only:
- HTTP 408 Request Timeout
- HTTP 429 Too Many Requests (with
Retry-Afterrespected) - HTTP 500, 502, 503, 504
- Connection / DNS / socket exceptions
It does not retry on:
- HTTP 400, 401, 403, 404 — permanent client errors; retrying won’t change anything.
- HTTP 405, 406, 409 — semantic conflicts; retrying won’t help.
For specific 4xx codes that should be treated as transient (rare — e.g. some APIs return 423 Locked on transient contention), configure opts.Retry.ShouldHandle with a custom predicate.
Circuit breaker
The breaker opens when too many consecutive failures land in a short window. While open, requests fail fast with BrokenCircuitException instead of waiting for the timeout. After a cooldown, the breaker half-opens — one trial request decides whether to close (resume normal traffic) or stay open (downstream still broken).
This is what stops a flaky downstream from cascading into your threadpool. Watch breaker open / close events in your traces; opens that don’t half-close mean the downstream is genuinely down and needs human attention.
Hedging (for read-only calls)
Hedging fires N parallel requests; the first response wins, the rest are cancelled. Trades load for latency. Don’t enable globally — it doubles the downstream’s load — but for specific high-percentile-sensitive reads it can be worth it:
.AddStandardHedgingHandler(opts =>{ opts.Hedging.MaxHedgedAttempts = 2; // up to 2 parallel requests opts.Hedging.Delay = TimeSpan.FromMilliseconds(200); // wait 200ms before firing the hedge});Use it for paid third-party APIs only when their tail latency genuinely matters and they can absorb 2× the call volume.
Gotchas
HttpClient.Timeoutis overridden by the pipeline’sTotalRequestTimeout. Configure timeouts on the pipeline, not on the client.- Retry + non-idempotent requests = duplicate creates. If you POST
/api/v1/ordersand retry on 502, you might create two orders. Combine with Idempotency-Key on every retried POST. - Polly’s exceptions wrap downstream exceptions. When catching, look at
InnerExceptionto get the actual transport failure. - OpenTelemetry sees each attempt as its own span. A request that succeeds on the third try shows three spans in the trace, with two
error=trueparents. This is correct; ignore the noise in dashboards.
Related
- Idempotency — required when retrying non-idempotent operations.
- Webhooks module — the most visible consumer.
- Background jobs — Hangfire
[AutomaticRetry]complements HTTP-layer retry.