I'm trying to make a piece of code run faster. The code is already using async/await. But it's still slow.
So I tried to alter my foreach to use the new IAsyncEnumerable. However I gained 0 performance from this. And it appears to run the code sequentially. Which surprised me. I thought the await foreach would run each iteration in its own thread.
Here's my attempt at speeding up the code.
var bag = new ConcurrentBag<IronPdf.PdfDocument>(); // probably don't need a ConcurrentBag
var foos = _dbContext.Foos;
await foreach (var fooPdf in GetImagePdfs(foos))
{
bag.Add(fooPdf);
}
private async IAsyncEnumerable<IronPdf.PdfDocument> GetImagePdfs(IEnumerable<Foo> foos)
{
foreach (var foo in foos)
{
var imagePdf = await GetImagePdf(foo);
yield return imagePdf;
}
}
private async Task<IronPdf.PdfDocument> GetImagePdf(Foo foo)
{
using var imageStream = await _httpService.DownloadAsync(foo.Id);
var imagePdf = await _pdfService.ImageToPdfAsync(imageStream);
return imagePdf;
}
using IronPdf;
public class PdfService
{
// this method is quite slow
public async Task<PdfDocument> ImageToPdfAsync(Stream imageStream)
{
var imageDataURL = Util.ImageToDataUri(Image.FromStream(imageStream));
var html = $@"<img style=""max-width: 100%; max-height: 70%;"" src=""{imageDataURL}"">";
using var renderer = new HtmlToPdf(new PdfPrintOptions()
{
PaperSize = PdfPrintOptions.PdfPaperSize.A4,
});
return await renderer.RenderHtmlAsPdfAsync(html);
}
}
I also gave Parallel.ForEach a try
Parallel.ForEach(foos, async foo =>
{
var imagePdf = await GetImagePdf(foo);
bag.Add(imagePdf);
});
However I keep reading that I shouldn't use async with it, so not sure what to do. Also the IronPdf library crashes when doing it that way.
The problem with your
foreachandawait foreachapproaches is they are going to execute sequentially (even though they take advantage of the async and await pattern). Essentially,awaitdoes exactly that, awaits.In regards to the
Parallel.ForEachyour suspicions are correct, it's not suitable for async methods an IO bound workloads.Parallel.ForEachtakes an Action delegate and giving an async lambda to anActionactually just creates anasync voidwith the consequence of each task running unobserved (which has several disadvantages).There are many approaches to take from here, but the simplest is to start each task hot, project them to a collection, and
awaitthem all to completion. This way you are letting the IO bound workloads offload (term used loosely) to an IO Completion Port, thus allowing any potential thread to go back to the thread pool to get reused by the Task Scheduler efficiently until the IO work completes.Assuming there are no shared resources, just project the started tasks to an
IEnumerable<Task<PdfDocument>>and useTask.WhenAllIn the above scenario, when
Selectenumerates theasyncmethodGetImagePdfseach Task is started hot, the Task Scheduler takes care of scheduling any threads that are needed from the threadpool. As soon as any code awaits an IO job a callback is made with the operating system and the thread goes back to the pool to get reused, so on and so forth.Task.WhenAllwaits for all the tasks to complete or fault then returns a collection of each result.