I am sorry, if this is much of a dumb question. But I can't really figure this out, and I bet it has to be much simpler than I think.
I have a byte[] array which contains several Unicode Strings, each char takes clearly 2 bytes, and each string is delimited by two 00 00 bytes, until double 00 00 marks the end of it all.
When I try to use UnicodeEncoding.Unicode.GetString(myBuffer) I do get the first string, but when the delimiter byte is found it start to get garbage all around.
Right now I am parsing byte by byte and then concatenating things but I am sure there has to be a better way into this.
I was wondering if I should try to find the "position" of the delimiter bytes and then limit the GetString method to that lent? But if so, how do you find 2 the position of 2 specific bytes in a byte array?
the example byte array looks like this:
Hex View
00000000 73 00 74 00 72 00 31 00 00 00 73 00 74 00 72 00 s.t.r.1...s.t.r.
00000010 32 00 00 00 73 00 74 00 72 00 33 00 00 00 00 00 2...s.t.r.3.....
So your buffer is valid little endian UTF-16 data. Those "double 00 bytes" is just the NUL character, or
\0.Encoding.Unicode.GetString(myBuffer)will actually correctly decode the whole buffer, but it'll have embedded NUL characters in it delimiting each sub string. Which is fine, because\0is just like any character. This isn't C.The sample code below will use
Console.WriteLineto signify "use the substring", but feel free to substitute with what is appropriate.First approach: decode the whole thing
If you split by
\0after decoding, you can get all the substrings, removing empty entries to get rid of those final NULs:Alternatively, you can search for the first NUL if you want:
Second approach: split, then decode
If you don't want to do it all in one go because you have to process a lot of data at once, then you could just find the next
0 0byte sequence, and then decode from there:Alternatively, cast to a span of
char, which would allow you to useToString()on the span to get a string, skipping the decoding step, but this assumes the data is all valid text (as ultimately, all you're doing is skipping the validation). Up to you.Third approach: reading from a stream, character by character
But then if you have that much data on hand you probably should be reading from a stream, using a
StreamReaderto decode on the go:Fourth approach: reading from a stream in batches
An optimisation to the code above would be to call
Readwith a block of chars and then piece the data back yourself:That should get you decent results.
Now, those two final approaches use a
StringBuilderto build the substrings, but you don't have to do it that way, you can send those characters elsewhere (maybe you're writing them to a file).