Microsoft.Extensions.Http.Resilience (Polly v8 under the hood) backs the kit’s outbound HTTP calls. The Web block ships an AddHeroResilience extension that attaches the standard resilience pipeline — retry with backoff on transient failures, a circuit breaker to fail fast when a downstream is consistently broken, and total + per-attempt timeouts to prevent hung requests from holding threadpool slots — to any IHttpClientBuilder, configured from one appsettings section.

How it’s wired

Resilience is opt-in per client, not global. The Webhooks module is the shipped example:

builder.Services.AddHttpClient("Webhooks")
    .AddHeroResilience(builder.Configuration);

AddHeroResilience reads the HttpResilienceOptions section and calls AddStandardResilienceHandler with those values. When you add your own outbound integration, do the same — name the client, chain the extension.

What the standard pipeline does

[Outer]      Total request timeout (30s default)
  ↓
[Retry]      Up to 3 attempts on transient errors with backoff
  ↓
[Breaker]    Circuit breaker — opens at 50% failure ratio, 5s break
  ↓
[Attempt]    Per-attempt timeout (10s default)
  ↓
  HttpClient.SendAsync

Configuration

{
  "HttpResilienceOptions": {
    "Enabled": true,                          // default — false skips the handler entirely
    "MaxRetryAttempts": 3,
    "MedianFirstRetryDelay": "00:00:01",      // first retry delay; backoff grows from here
    "TotalTimeout": "00:00:30",               // whole request incl. retries
    "AttemptTimeout": "00:00:10",             // each individual attempt
    "CircuitBreakerBreakDuration": "00:00:05",
    "CircuitBreakerFailureRatio": 0.5,
    "CircuitBreakerMinimumThroughput": 10
  }
}

These are the code defaults — the shipped appsettings.json doesn’t carry the section, so you only add it when tuning. One section configures every client that opted in via AddHeroResilience; for genuinely different upstreams (a slow third-party API), register that client with its own AddStandardResilienceHandler(opts => ...) lambda instead.

How resilience integrates with `[AutomaticRetry]`

Hangfire’s [AutomaticRetry] and the HTTP resilience pipeline are two separate retry layers. The kit uses both, deliberately:

HTTP-layer retry handles transient network blips (5xx, timeouts) within a single job attempt — fast, in-process, no job queue churn.
Job-layer retry handles longer outages — minutes-to-hours-scale backoff while the Hangfire job stays in the queue.

For the Webhooks module: each delivery attempt runs through the Polly pipeline (up to 3 in-process retries, seconds apart). If the attempt still fails, the Hangfire job fails and [AutomaticRetry(Attempts = 4, DelaysInSeconds = [30, 120, 600, 3600])] re-runs it — up to 5 job-level attempts spread over ~1 h 12 min.

When NOT to retry

The standard resilience handler retries on transient failures only:

HTTP 408 Request Timeout
HTTP 429 Too Many Requests
HTTP 500+ server errors
Connection / timeout exceptions (HttpRequestException, attempt timeouts)

It does not retry permanent client errors — 400, 401, 403, 404, 405, 409 — because retrying won’t change the answer. For specific 4xx codes that should be treated as transient (rare — e.g. an API that returns 423 Locked on transient contention), configure ShouldHandle with a custom predicate on a per-client AddStandardResilienceHandler registration.

Circuit breaker

The breaker opens when the failure ratio crosses CircuitBreakerFailureRatio (50%) over at least CircuitBreakerMinimumThroughput (10) requests. While open, requests fail fast with BrokenCircuitException instead of waiting for the timeout. After CircuitBreakerBreakDuration (5 s), the breaker half-opens — a trial request decides whether to close (resume normal traffic) or stay open (downstream still broken).

This is what stops a flaky downstream from cascading into your threadpool. Watch breaker open / close events in your traces; opens that don’t half-close mean the downstream is genuinely down and needs human attention.

Hedging (for read-only calls)

The kit doesn’t ship a hedging pipeline, but Microsoft.Extensions.Http.Resilience includes one if you need it. Hedging fires parallel requests; the first response wins, the rest are cancelled. Trades load for latency:

services.AddHttpClient("LatencySensitiveReads")
    .AddStandardHedgingHandler(opts =>
    {
        opts.Hedging.MaxHedgedAttempts = 2;                   // up to 2 parallel requests
        opts.Hedging.Delay = TimeSpan.FromMilliseconds(200);  // wait 200ms before firing the hedge
    });

Don’t enable it globally — it multiplies the downstream’s load. Reserve it for high-percentile-sensitive reads against upstreams that can absorb the extra volume.

Gotchas

Configure timeouts on the pipeline, not the client. The pipeline’s TotalTimeout is the effective bound; a mismatched HttpClient.Timeout just adds a second, confusing limit.
Retry + non-idempotent requests = duplicate creates. If you POST and retry on 502, you might create two resources. Combine with Idempotency-Key on every retried POST.
Look at InnerException. Polly’s timeout/breaker exceptions wrap the underlying transport failure.
OpenTelemetry sees each attempt as its own HTTP client span. A request that succeeds on the third try shows three spans in the trace. This is correct; ignore the noise in dashboards.

Idempotency — required when retrying non-idempotent operations.
Webhooks module — the shipped consumer.
Background jobs — Hangfire [AutomaticRetry] complements HTTP-layer retry.