I am trying to store a Chinese character in a variable of type wchar_t and print out this character. However, the program incorrectly prints a ?. Here is my code.
#include <iostream>
int main() {
using std::wcout;
using std::endl;
wchar_t c = L'人'; // 人 is a Chinese character
wcout << c << endl;
}
?
Changing wchar_t to char will cause the program not to compile.
P.S. The encoding of my terminal is UTF-8.
wchar_t is outdated
You may have noticed that there are two version of Windows API: the
Aversion and theWversion.WAPIs acceptwchar_tas their normal parameters. Awchar_ton windows is 2-bytes long, the encoding of awchar_tstring is something named "UTF16LE" or "UCS-2", which means store every character in 16 bits (two bytes) with a byte order of little endian.But 2 bytes can only represent 2^16 (65536) characters, it can't represent the full Unicode character set.
See this answer
Note that the length of
wchar_tis a platform-defined value which varies among platforms. For example, on Linux it is 4-bytes long. If you are making universal applications, it's bad idea to havewchar_tin your code.What to do
So back to the question, how can we store such characters in your program?
Firstly a Chinese character is not a
char. It's a string. It contains 3 bytes in UTF-8 and 2 bytes in UTF-16.So you should do it this way:
cis declared aschar[]so it can hold a string.Note that there is no
=when definingc, it is a c++-only syntax. If you are writing C, you should instead write:But it may still FAIL to print! Why?
In default the encoding of Windows console is CP936 (GBK) for zh-CN language. MSVC uses that encoding, too. (I don't know clearly about this, needs testing) So if your source file is GBK, your compiler is, and your console is, you will get the right output.
But if one encoding mismatches, your program will still fail.
It's never considered good idea to print non-ascii strings to console, especially on windows. You can write them to a file, send them through network, show them with Win32 GUI, or even make your own character rendering engine. Don't rely on the console too much.
Trivia
chcpis used to set encoding in windows console. For example: