Write to a file with a specific encoding in Java

15.4k Views Asked by At

This might be related to my previous question (on how to convert "för" to "för")

So I have a file that I create in my code. Right now I create it by the following code:

FileWriter fwOne = new FileWriter(wordIndexPath);
BufferedWriter wordIndex = new BufferedWriter(fwOne);

followed by a few

wordIndex.write(wordBuilder.toString()); //that's a StringBuilder

ending (after a while-loop) with a

wordIndex.close();

Now the problem is later on this file is huge and I want (need) to jump in it without going through the entire file. The seek(long pos) method of RandomAccessFile lets me do this.

Here's my problem: The characters in the file I've created seem to be encoded with UTF-8 and the only info I have when I seek is the character-position I want to jump to. seek(long pos) on the other hand jumps in bytes, so I don't end up in the right place since an UTF-8 character can be more than one byte.

Here's my question: Can I, when I write the file, write it in ISO-8859-15 instead (where a character is a byte)? That way the seek(long pos) will get me in the right position. Or should I instead try to use an alternative to RandomAccessFile (is there an alternative where you can jump to a character-position?)

1

There are 1 best solutions below

2
Joop Eggen On BEST ANSWER

Now first the worrisome. FileWriter and FileReader are old utility classes, that use the default platform settings on that computer. Run elsewhere that code will give a different file, will not be able to read a file from another spot.

ISO-8859-15 is a single byte encoding. But java holds text in Unicode, so it can combine all scripts. And char is UTF-16. In general a char index will not be a byte index, but in your case it probably works. But the line break might be one \n or two \r\n chars/bytes - platform dependently.

Re

Personally I think UTF-8 is well established, and it is easier to use:

byte[] bytes = string.getBytes(StandardCharsets.UTF_8);
string = new String(bytes, StandardCharsets.UTF_8);

That way all special quotes, euro, and so on will always be available.

At least specify the encoding:

Files.newBufferedWriter(file.toPath(), "ISO-8859-15");