Encountered interesting livelock situation that has to do with asynchrony.
Consider the code below that causes livelock and executes for 1 minute even though useful payload takes almost nothing to run. The reason for execution time to be around 1 minute is that we actually will hit thread pool grow limit (around 1 thread per second), so 300 iterations will make it run for around 5 minutes.
This is not trivial deadlock where we synchronously wait asynchronous operation in an environment with SyncronizationContext allowing scheduling jobs on a single thread only (e.g. WPF, WebAPI, etc). The code bellow reproduces an issue on Console Application where there is no explicit SynchronizationContext set and tasks are being scheduled on a thread pool.
I know that "solution" to this problem is "asynchrony all the way". In the real word we might not know that somewhere deep inside the developer of SyncMethod suppresses asynchrony via waiting it in a blocking way unleashing such issues (even if he might did the trick with replacing SynchronizationContext to make it not deadlock at least).
What are your suggestions to deal with such an issue when "asynchrony all the way" is not an option? Is there something else rather than obvious "do not spawn so many tasks at once"?
void Main()
{
List<Task> tasks = new List<Task>();
for (int i = 0; i < 60; i++)
tasks.Add(Task.Run(() => SyncMethod()));
bool exit = false;
Task.WhenAll(tasks.ToArray()).ContinueWith(t => exit = true);
while (!exit)
{
Print($"Thread count: {Process.GetCurrentProcess().Threads.Count}");
Thread.Sleep(1000);
}
}
void SyncMethod()
{
SomethingAsync().Wait();
}
async Task SomethingAsync()
{
await Task.Delay(1);
await Task.Delay(1); // extra puzzle -- why commenting one of these Delay will partially resolve the issue?
Print("async done");
}
void Print(object obj)
{
$"[{Thread.CurrentThread.ManagedThreadId}] {DateTime.Now} - {obj}".Dump();
}
Here is an output. Notice how all async continuations stuck for almost a minute and then all the sudden continued execution.
[12] 30.01.2018 23:34:36 - Thread count: 18 [12] 30.01.2018 23:34:37 - Thread count: 32 [12] 30.01.2018 23:34:38 - Thread count: 33 -- THREAD POOL STARTS TO GROW ... [12] 30.01.2018 23:35:18 - Thread count: 70 [12] 30.01.2018 23:35:19 - Thread count: 71 [12] 30.01.2018 23:35:20 - Thread count: 72 -- UNTIL ALL SCHEDULED TASKS CAN FIT [8] 30.01.2018 23:35:20 - async done -- ALMOST A MINUTE AFTER START [8] 30.01.2018 23:35:20 - async done -- THE CONTINUATIONS START GO THROUGH ... [61] 30.01.2018 23:35:20 - async done [10] 30.01.2018 23:35:20 - async done
Answering the original question:
By no means a solution for the root cause, but a quantitative remedy - we can adjust Thread Pool using
SetMinThreadsincreasing the amount of threads that will be created without a delay (so that way faster than regular "injection rate" which is on my setup 1 thread pool thread per second). The way it works in a given setup is simple. Basically we are wasting the Thread Pool threads until the pool grows big enough to start to execute the continuations. If we start with big enough pool we are basically eliminating the period of time where we just bound by the artificial "injection rate" which tries to keep amount of threads low (which makes sense, as thread pool is designed to run CPU-bound tasks instead of being blocked waiting asynchronous operation).I should also leave a warning note.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool.setminthreads?view=netframework-4.8
There is also an interesting issue where Microsoft recommends increasing the "min threads" for ASP.NET as a performance/reliability improvement in some scenarios.
https://support.microsoft.com/en-us/help/821268/contention-poor-performance-and-deadlocks-when-you-make-calls-to-web-s
Interestingly, the problem described in the question is not purely imaginary. It is real. It happens with well-known and widely recognized software. Example from the experience -- Identity Server 3.
https://github.com/IdentityServer/IdentityServer3.EntityFramework/issues/101
The implementation that has this caveat (we had to rewrite it to work around the problem for our production scenario):
https://github.com/IdentityServer/IdentityServer3.EntityFramework/blob/master/Source/Core.EntityFramework/Serialization/ClientConverter.cs
Another article that explains the issue in details.
https://blogs.msdn.microsoft.com/vancem/2018/10/16/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall/
As to the strange behavior for single
Task.Delaywhere some async invocations are completed with each new injected Thread Pool thread. It seems to be caused by continuation execution inlining along with the wayTask.DelayandTimerare implemented. See this call stack, it shows that newly created Thread Pool thread is doing some additional magic for .NET Timers when it's created, before processing Thread Pool queue (seeSystem.Threading.TimerQueue.AppDomainTimerCallback).