Why apache tika with parseToString returns extra new line character at the end of the string?

73 Views Asked by At

Important parts of my code:

import org.apache.tika.Tika;

Tika tika;

"test".getBytes() // <-- contains array of 4 bytes

tika.parseToString(TikaInputStream.get("test".getBytes())) // <-- will return 5 bytes with newline /n

Why it is so and can Tika return only the original number of bytes without a newline?

It looks like a bug in a Tika that needs to be fixed.

Is there any easy short solution to do this without removing the newline after parsing on my own or without using some content handlers? The documentation for Tika didn't help me.

(I suppose my content to parse will have a lot of new line characters that I want to keep even at the end and I cannot be sure if Tika will add this character always at the end or just for some kind of input)

0

There are 0 best solutions below