CString to UTF8 conversion fails for "ý"

2.2k Views Asked by At

In my application I want to convert a string that contains character ý, to UTF-8. But its not giving the exact result. I am using WideCharToMultiByte function, it is converting the purticular character to ý.

For Example : Input - "ý" Output - "ý"

Please see the code below..

String strBuffer("ý" );
char *utf8Buffer = (char*)malloc(strBuffer.GetLength()+1);
int utf8bufferLength = WideCharToMultiByte(CP_UTF8, 0, (LPCWSTR)strBuffer.GetBuffer(strBuffer.GetLength() + 1)),
 strBuffer.GetLength(), utf8Buffer, strBuffer.GetLength() * 4,0,0);

Please give your suggestions...

  • Binoy Krishna
2

There are 2 best solutions below

0
Dialecticus On

Unicode codepoint for letter ý, according to this page is 25310 or FD16. UTF-8 representation is 195 189 decimal or C3 BD hexadecimal. These two bytes can be seen as letters ý in your program and/or debugger, but they are UTF-8 numbers, so they are bytes, not letters.

In another words the output and the code are fine, and your expectations are wrong. I can't say why are they wrong because you haven't mentioned what exactly were you expecting.

EDIT: The code should be improved. See Rudolfs' answer for more info.

0
Rudolfs Bundulis On

While I was writing this an answer explaining the character values you are seeing was already posted, however, there are two things to mention about your code:

1) you should use the _T() macro when initializing the string: CString strBuffer(_T("ý")); The _T() macro is defined in tchar.h and maps to the correct string type depending on the value of the _UNICODE macro.

2) do not use the GetLength() to calculate the size of the UTF-8 buffer, see the documentation of WideCharToMultiByte in MSDN, it shows how to use the function to calculate the needed length for the UTF-8 buffer in the comments section.

Here is a small example that verifies the output according to the codepoints and demonstrates how to use the automatic length calculation:

#define _AFXDLL
#include <afx.h>

#include <iostream>

int main(int argc, char** argv)
{
    CString wideStrBuffer(_T("ý"));
    // The length calculation assumes wideStrBuffer is zero terminated
    CStringA utf8Buffer('\0', WideCharToMultiByte(CP_UTF8, 0, wideStrBuffer.GetBuffer(), -1, NULL, 0, NULL, NULL));
    WideCharToMultiByte(CP_UTF8, 0, wideStrBuffer.GetBuffer(), -1, utf8Buffer.GetBuffer(), utf8Buffer.GetLength(), NULL, NULL);
    if (static_cast<unsigned char>(utf8Buffer[0]) == 195 && static_cast<unsigned char>(utf8Buffer[1]) == 189)
    {
        std::cout << "Conversion successful!" << std::endl;
    }
    return 0;
}