I try to update some attribute values an existing XML file using jdom2. I'm getting the utf8 encoding problem when I create the xml file.
The attribute value is "1 student Noun".
But the value I see in the output is :
1	student	Noun
The code I have written is shown below:
SAXBuilder builder = new SAXBuilder();
Document document = document = builder.build(filePath);
Element root = document.getRootElement();
for(Element sentenceElement : root.getChildren("sentence")){
String transcriptionText = "";
for(Element transcriptionElement : sentenceElement.getChildren("transcription")){
for(Element wordElement : transcriptionElement.getChildren("word")){
transcriptionText += " "+wordElement.getAttributeValue("text");
}
transcriptionParser = ParserUtil.getResponse(transcriptionText);
transcriptionElement.getAttribute("text").setValue(transcriptionText);
transcriptionElement.getAttribute("parser").setValue(transcriptionParser);
}
for(Element translationElement : sentenceElement.getChildren("translation")){
translationParser = getResponse(translationElement.getAttributeValue("text"));
translationElement.getAttribute("parser").setValue(translationParser);
}
}
Format format = Format.getPrettyFormat();
XMLOutputter xmlOutput = new XMLOutputter(format);
/*try (OutputStream out = new FileOutputStream(filePath)) {
xmlOutput.output(document, out);
}catch(Exception e){
e.printStackTrace();
}
}*/
xmlOutput.output(document, Files.newBufferedWriter(Paths.get(filePath),StandardCharsets.UTF_8));
I have used both of the options:
xmlOutput.output(document, Files.newBufferedWriter(Paths.get(filePath),StandardCharsets.UTF_8));
and
try (OutputStream out = new FileOutputStream(filePath)) {
xmlOutput.output(document, out);
}catch(Exception e){
e.printStackTrace();
}
But none of them have been solved the problem. How to solve the problem?
The string
"1 student Noun"obviously contains tab characters between the words.So if the XML output contains
1	student	Nounthats' perfectly ok. The tab characters has the Unicode value 9 and	is a proper XML entity to represent that.