C++ (Windows): MultiByteToWideChar not working for some characters

537 Views Asked by MiloDC At 24 December 2020 at 12:03

I'm reading a file of thousands of non-English strings, many of them East Asian, using fgets, and subsequently calling MultiByteToWideChar to convert them back to Unicode:

WCHAR wstr[BUFSIZ] = { '\0' };
int result = MultiByteToWideChar(CP_UTF8, MB_ERR_INVALID_CHARS, src, -1, wstr, BUFSIZ);

This approach is working fine in nearly every case. The two strings for which it isn't working are:

我爱你  (read in by fgets as "æˆ‘çˆ±ä½")
コム    (read in by fgets as "ã‚³ãƒ")

In both cases, the call to MultiByteToWideChar returns zero, and the final character of wstr is garbage:

我爱�  (final character xE4xBD)
コ�    (final character xE3x83)

Is there some environmental set-up, or alternative manner of reading my text file, that would eliminate this problem?

Original Q&A

There are 1 best solutions below

MiloDC On 24 December 2020 at 17:41

I found the problem, thanks to Raymond Chen's observation that the number of bytes in the source string was incorrect for 我爱你 and コム.

The code that I'm debugging trims the string read by fgets for whitespace, which winds up corrupting them, since the ASCII versions of 我爱你 and コム apparently both end in whitespace.

C++ (Windows): MultiByteToWideChar not working for some characters

There are 1 best solutions below

Related Questions in C++

Related Questions in UNICODE

Related Questions in WINDOW

Related Questions in MULTIBYTE-CHARACTERS

Trending Questions

Popular # Hahtags

Popular Questions