print unicode character on linux using gcc

234 Views Asked by At

I'm trying to print wchar_t string to terminal but the string doesn't show up or it appears as unreadable characters.

I tried on XUbuntu 22.04 and gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 and you can see the sample code here,

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
        setlocale(LC_ALL, "en_US.UTF-8");
    wchar_t sample1[] = { L"Sample TEXT\\自己人自己人人       AZZZZZZZA己中国中中中\n" };
    printf("AAAA\n");
    printf("%ls", L"ABCD");
    printf("%ls", sample1);
    return 0;
}

and I compile it using gcc as follow,

gcc test.c -fshort-wchar -o test

I write the data to a file on Windows as unicode and I should read the file and print it's content on Linux. So wchar_t on Windows is 16bit but on Linux its 32bit that's why I used -fshort-wchar gcc flag.

In the output of the above code I can only see "AAAA\n" thats it.

What is the issue with my code? How can I print unicode wchar_t in C properly and be able to read it in my terminal?

I will rephrase my question as suggested in the first comment, I have a file saved as utf-16 on Windows, how do I print it on Linux?

Thanks

2

There are 2 best solutions below

1
KamilCuk On BEST ANSWER

What is the issue with my code?

The issue with your code is that you used -fshort-wchar and glibc was compiled to work with 32-bit wchar_t. In turn, printf("%ls" accesses the memory as a 32-bit array, while the array has 16-bit elements.

How can I print unicode wchar_t in C properly and be able to read it in my terminal?

Do not use -fshort-wchar or compile anything that you use like C standard library and other libraries that you indent to use with -fshort-wchar.

the data to a file on Windows as unicode and I should read the file and print it's content on Linux

Then you have to know the "unicode" format that windows has written the file in. Once it is known, it is typical use iconv command or function to convert the file. You can also use libraries like libunistring or icu to handle unicode.

0
chux - Reinstate Monica On

Each stream, such as stdout has an orientation for char or wchar_t and initially can handle either one. Once the first I/O occurs, such as printf("AAAA\n"); orientation is established. In this case to char.

To potentially subsequently print to another orientation, first re-open the stream.

Each stream has an orientation. After a stream is associated with an external file, but before any operations are performed on it, the stream is without orientation. Once a wide character input/output function has been applied to a stream without orientation, the stream becomes a wide-oriented stream. Similarly, once a byte input/output function has been applied to a stream without orientation, the stream becomes a byte-oriented stream. Only a call to the freopen function or the fwide function can otherwise alter the orientation of a stream. (A successful call to freopen removes any orientation. C17dr §7.21.2 4.

printf("AAAA\n");
FILE *f = freopen(NULL, "w", stdout);
if (f == NULL) Handle_failure();
printf("%ls", L"ABCD");

In investigating I/O issues, it is useful to test the return values.

int retval = printf("AAAA\n");
assert(retval >= 0);
retval = printf("%ls", L"ABCD");
assert(retval >= 0);