I have created a script to download files from an ftp endpoint. I was assured that the files would be in utf-8 encoding but upon downloading and parsing the xml, we encounter bad formatting. The process is to download the file, convert the xml to json and parse and convert to a different format. What we see after converting to json is for example the following which appears instead of chinese/hindi/arabic characters:
"Size": 3227, "Title": "??? ???? ????? ?? ???? ?? 5 ??? ?? ??? ?? ?? ???? ?? ????????? ?? ???? ???? ??????-Pakistan new army chief
The code snippet is the following:
ftp.connect("xx.xxx.xxx.xx");
ftp.login("xxxx", "xxxxx");
ftp.enterLocalPassiveMode();
ftp.setControlEncoding("UTF-8");
ftp.setFileType(FTP.BINARY_FILE_TYPE);
...
String remoteFile1 = ftp.printWorkingDirectory() + "/" + file.getName();
File downloadFile1 = new File(destFolder + "/" + "/" + file.getName());
OutputStream outputStream1 = new BufferedOutputStream(new FileOutputStream(downloadFile1));
boolean success = ftp.retrieveFile(remoteFile1, outputStream1);
outputStream1.flush();
outputStream1.close();
....
DocumentBuilderFactory docFactory =
DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
doc = docBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer trans = tf.newTransformer();
StringWriter sw = new StringWriter();
trans.transform(new DOMSource(doc), new StreamResult(sw));
String xml = sw.toString();
JSONObject xmlJSONObj = XML.toJSONObject(xml);
String jsonPrettyPrintString = xmlJSONObj.toString(4);
jsonMapper.configure(SerializationFeature.WRAP_ROOT_VALUE, false);...
Can someone advise how to ensure the encoding can be changed to output the correct format for foreign characters?