I have attempted to upgrade to V8 of Polly but now the status of a CircuitBreaker becomes stuck as open indefinitely.
My classic AdvancedCircuitBreakerAsync continues to work following a Nuget download of the V8 Polly package but then I tried to adopt the substantial V8 api changes including ResiliencePipelineBuilder, CircuitBreakerStrategyOptions and CircuitBreakerStateProvider. Now I am experiencing the stuck open circuit problem.
The V8 problem only occurs when I skip repeated calls to _resiliencePipeline.ExecuteAsync(...) with the following logic:
if (_circuitBreakerStateProvider.CircuitState == CircuitState.Open)
{
// return fallback value indicating remote service is unavailable but
// the CircuitState never progresses to HalfOpen
return null;
}
await _resiliencePipeline.ExecuteAsync( ... );
The equivalent logic with a classic circuit breaker works as expected:
if (_breakerPolicy.CircuitState == CircuitState.Open)
{
return null;
}
// after the configured durationOfBreak timespan expires we get here with circuit at HalfOpen
await _breakerPolicy.ExecuteAsync( ... );
It seems that the circuit state will not progress from Open to HalfOpen without further invocation of _resiliencePipeline.ExecuteAsync()
UPDATE #1
As requested by @Peter Csala.
Here is my pipeline config, it is designed for a fail-fast requirement in a low traffic situation. There is no direct DI configuration for the ResiliencePipeline. The Pipeline is declared and held in an application RpcFacade class that is itself a singleton, a breakpoint confirms the pipeline is built once and all testing is through single user manual UI testing of a Blazor server application.
services.AddSingleton<RpcFacade>();
_breakerState = new CircuitBreakerStateProvider();
var builder = new ResiliencePipelineBuilder<List<LiveAgentInfo>>().AddCircuitBreaker( new CircuitBreakerStrategyOptions<List<LiveAgentInfo>>
{
FailureRatio = 0.1,
SamplingDuration = TimeSpan.FromSeconds( 10 ),
MinimumThroughput = 2,
BreakDuration = TimeSpan.FromSeconds( 15 ),
ShouldHandle = new PredicateBuilder<List<LiveAgentInfo>>().Handle<RpcException>(),
StateProvider = _breakerState
} );
Update #2
I have enabled additional Polly logging. This logs an entry "Resilience event occurred. EventName: 'OnCircuitOpened'" as the Circuit transitions from Closed to Open on the second failing GRPC call to a remote GRPC service that is not reachable.
Logging of _circuitBreakerStateProvider.CircuitState just before the _resiliencePipeline.ExecuteAsync( ... ) confirms the Closed to Open transition so the _circuitBreakerStateProvider instance is maintaining a valid observation of the CircuitBreaker's internal state.
Update #3
Further testing reveals another insight. In the original code above I showed how I was returning a fallback value when CircuitState == CircuitState.Open in order to avoid calling _resiliencePipeline.ExecuteAsync(...) during the Open 15 second window.
If I always call _resiliencePipeline.ExecuteAsync( ... ) even when the circuit status is open then I get BrokenCircuitExceptions during the open window, then after 15 seconds the circuit breaker lets through a remote call which triggers an RpcException. At this point I see a log entry "Resilience event occurred. EventName: 'OnCircuitHalfOpened' in the log".
It seems the circuit breaker only identifies that it has reached a HalfOpen state condition during resiliencePipeline.ExecuteAsync( ... ). It then makes the external grpc call expressed in my lambda which fails and the state returns to Open. From the external perspective of the CircuitBreakerStateProvider the state appears stuck at Open.
As a workaround, I can return a fallback value in my BrokenCircuitException catch, this fails fast and is outcome I wanted.
TL;DR: V7's
CircuitStateproperty's getter is more complex than V8's.In order to understand why V7 does and why V8 does not transition from
OpenintoHalfOpenstate without invokingExecuteAsyncwe need to look a bit under the hood. It won't be painful I promise.In both cases we have a controller class which is stateful and does the heavy-lifting. The policy/strategy uses the controller to ask for the state transitions.
V7
Here we have a
CircuitStateControllerwith a bunch of fields.The important ones from this question perspective:
_blockedTillcaptures a date time until when the CB must remain inOpenstate before it could transition intoHalfOpen_durationOfBreak_onHalfOpenis called ONLY when theCircuitStateis evaluatedWhy is this important from the question perspective? Because in your early exit condition you are directly asking the Circuit Breaker to please re-evaluate the circuit state (
_breakerPolicy.CircuitState == CircuitState.Open). That's why it transitions fromOpentoHalfOpen.This also means that V7 does not transition from
OpentoHalfOpenautomatically. If your CB breaks and you don't assess theCircuitStateeither directly or via theExecuteAsyncit will remain inOpenstate (and youronHalfOpenwon't be triggered).V8
In the new version the
_onHalfOpenuser delegate is called only from theScheduleHalfOpenTaskmethod:and this method is only being called from the
OnActionPreExecuteAsync.Here is an excerpt of the method:
As you have already figured out this
OnActionPreExecuteAsyncis called ONLY by the strategy whenever itsExecuteCoreis being executed.The
CircuitBreakerStateProviderdoes not perform too much things whenever you retrieve the circuit state:The invoked method is specified inside the
Initializecall (from the strategy).The controller's
CircuitStateproperty's getter is super simple.As you can see it does not perform any check whether it should transition to
HalfOpenor not.Summary
OpentoHalfOpenbecause you access theCircuitStateand the getter might change the state of the circuit breaker.OpentoHalfOpenautomatically because the state provider simply returns the current state of the controller and does NOT induce any state change.I hope it was not painful and the description clarified certain things. :)