I'm reading a file of thousands of non-English strings, many of them East Asian, using fgets, and subsequently calling MultiByteToWideChar to convert them back to Unicode:
WCHAR wstr[BUFSIZ] = { '\0' };
int result = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, src, -1, wstr, BUFSIZ);
This approach is working fine in nearly every case. The two strings for which it isn't working are:
我爱你 (read in by fgets as "我爱ä½")
コム (read in by fgets as "コãƒ")
In both cases, the call to MultiByteToWideChar returns zero, and the final character of wstr is garbage:
我爱� (final character xE4xBD)
コ� (final character xE3x83)
Is there some environmental set-up, or alternative manner of reading my text file, that would eliminate this problem?
I found the problem, thanks to Raymond Chen's observation that the number of bytes in the source string was incorrect for 我爱你 and コム.
The code that I'm debugging trims the string read by
fgetsfor whitespace, which winds up corrupting them, since the ASCII versions of 我爱你 and コム apparently both end in whitespace.