I have an expensive method to call for creating a batch of source items:
private Task<List<SourceItem>> GetUnprocessedBatch(int batchSize)
{
//impl
}
I want to populate new items only when there is no item to process(or it falls below a certain threshold). I couldn't figure out which Source method to use so far.
I have implemented a crude stream that would keep returning new items:
public class Stream
{
private readonly Queue<SourceItem> scrapeAttempts;
private int batchSize = 100;
private int minItemCount = 10;
public Stream()
{
scrapeAttempts = new Queue<SourceItem>();
}
public async Task<SourceItem> Next()
{
if (scrapeAttempts.Count < minItemCount)
{
var entryScrapeAttempts = await GetUnprocessedBatch(batchSize);
entryScrapeAttempts.ForEach(attempt => scrapeAttempts.Enqueue(attempt));
}
return scrapeAttempts.Dequeue();
}
}
I expected Source.Task
would work but it looks like it calls it only once. How can I create a source for this scenario?
So, conceptually what you want is a Source stage, that fetches elements asynchronously in batches, buffers the batch and propagates events downstream one by one. When the buffer is close to being empty, we want to eagerly call the next fetch on the side thread (but not more than once), so it could complete while we're emptying current batch.
This sort of behavior will require building a custom GraphStage. One that could look like this: