java, StandardCharsets utf-16 issue

81 Views Asked by At

I'm trying to understand why the results are different when I try to write a test string using different encodings.

For StandardCharsets.UTF_16LE, the result is "test" (seems correct), while for StandardCharsets.UTF_16BE the result is " t e s t" (seems wrong).

Can someone please explain why in the case of UTF_16BE the result is having unnecessary spaces between letters?

String filename="C:\\Users\\name\\Downloads\\debugging.txt";
String str="test";

File fl = new File(filename);

try {
    FileOutputStream fos = new FileOutputStream(fl);

    //BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos, StandardCharsets.UTF_16LE)); //seems does work
    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(fos, StandardCharsets.UTF_16BE)); //seems does not work

    bw.write(str);
} catch (IOException ignored) {
    //some actions
}
1

There are 1 best solutions below

2
Pino On BEST ANSWER

Javadoc says:

When decoding, the UTF-16BE and UTF-16LE charsets interpret the initial byte-order marks as a ZERO-WIDTH NON-BREAKING SPACE; when encoding, they do not write byte-order marks.

Without a BOM, editors have to guess the correct encoding and they could be not so clever. Some editor could simply not support some encoding. So it depends on the editor that you use to read the file.