Consider this code
using var mem = new MemoryStream();
await using var writer = new StreamWriter(mem, Encoding.UTF8);
await writer.WriteLineAsync("Test");
await writer.FlushAsync();
mem.Position = 0;
Then this code throws
var x = Encoding.UTF8.GetString(mem.ToArray());
if (x[0] != 'T') throw new Exception("Bom is present in string");
Becaus BOM is present. Which doesnt make sense since GetString should decode the stream to decoded string.
This code works as intended and does not include the BOM
using var reader = new StreamReader(mem, Encoding.UTF8);
var x = await reader.ReadToEndAsync();
if (x[0] != 'T') throw new Exception("Bom is present in string");
Anyone know Microsofts reasoning about this? To me it seems strange to keep a BOM in a method called GetString.
It's important to remember that the
Encodingclass only deals with the encodingn, not streams, files or packets.GetStringconverts the full or partial contents of a byte buffer into a Unicode string. It may be called on the entire buffer, or it may be called on just a part of it withGetString (byte[] bytes, int index, int count);GetStringneither generates nor handles BOM bytes. The bytes were emitted byStreamWriterbecause the encoding used explicitly specifies it. The StreamWriter.Flush() source code shows that the method explicitly emits the output ofEncoding.GetPreamle()to the stream :GetBytesgenerates the bytes for the actual string contents. Its inverse,GetStringdoesn't handle BOMs either, those are handled by theStreamReaderclass or any custom code that reads raw bytes.From the Encoding.UTF8 property remarks:
StreamWriter uses UTF8 without BOM when no encoding is specified, both in .NET Framework and .NET Core :