I need to replace a string in a streaming http response. The naive way of doing that would be
using var reader = new StreamReader(input, leaveOpen: true);
var original = await reader.ReadToEndAsync();
var replaced = original.Replace(old, new, StringComparison.InvariantCultureIgnoreCase);
await output.WriteAsync(Encoding.UTF8.GetBytes(replaced));
This is very resource and memory intensive since the complete response must be read into memory before replacing the strings.
I've been looking at System.IO.Pipelines and the PipeReader. While this does give me efficient access to the stream, it works on byte which makes the conversion to char problematic when working in Utf-8.
One method I've seen is to use ReadLineAsync on the streamreader, but I cannot know if the stream will contain any newlines.
Another method I've seen is to use a queue, but even that seems clumsy.
So my question is: what is the best way to replace text is a stream without reading the full stream in memory?
If you have an infinite
Streamlike aNetworkStream, replacing string tokens on-the-fly would make a lot of sense. But because you are processing a finite stream of which you need the complete content, such a filtering doesn't make sense because of the performance impact.My argument is that you would have to use buffering. The size of the buffer is of course restricting the amount of characters you can process. Assuming you use a ring buffer or kind of a queue, you would have to remove one character to append a new. This leads to a lot of drawbacks when compared to processing the complete content.
Pros & Cons
stringallocations that occur during the search & replace.I stop here as the most relevant performance costs search & replace are better for the full-content solution. For the real-time search & replace we basically would have to implement our own algorithm that has to compete against the .NET search and replace algorithms. No problem, but considering the effort and the final use case I would not waste any time on that.
An efficient solution could implement a custom
TextReader, an advancedStreamReaderthat operates on aStringBuilderto search and replace characters. WhileStringBuilderoffers a significant performance advantage over the string search & replace it does not allow complex search patterns like word boundaries. For example, word boundaries are only possible if the patter explicitly includes the bounding characters.For example, replacing "int" in the input "internal int " with "pat" produces "paternal pat". If we want to replace only "int" in "internal int " we would have to use regular expression. Because regular expression only operates on
stringwe have to pay with efficiency.The following example implements a
StringReplaceStreamReaderthat extendsTextReaderto act as a specializedStreamReader. For best performance, tokens are replaced after the complete stream has been read.For brevity it only supports
ReadToEndAsync,ReadandPeakmethods.It supports simple search where the search pattern is simply matched against the input (called simple-search).
Then it also supports two variants of regular expression search and replace for more advanced search and replace scenarios.
The first variant is based on a set of key-value pairs while the second variant uses a regex pattern provided by the caller.
Because simple-search involves iteration of the source dictionary entries + multiple passes (one for each entry) this search mode is expected to be the slowest algorithm, although the replace using a
StringBuilderitself is actually faster. Under these circumstances, theRegexsearch & replace is expected to be significantly faster than the simple search and replace using theStringBuilderas it can process the input in a single pass.The
StringReplaceStreamReadersearch behavior is configurable via constructor.Usage example
Implementation
SearchMode
StringReplaceStreamReader.cs