text encoding in ftp download app causing errors

117 Views Asked by At

I have created a script to download files from an ftp endpoint. I was assured that the files would be in utf-8 encoding but upon downloading and parsing the xml, we encounter bad formatting. The process is to download the file, convert the xml to json and parse and convert to a different format. What we see after converting to json is for example the following which appears instead of chinese/hindi/arabic characters:

"Size": 3227, "Title": "??? ???? ????? ?? ???? ?? 5 ??? ?? ??? ?? ?? ???? ?? ????????? ?? ???? ???? ??????-Pakistan new army chief

The code snippet is the following:

        ftp.connect("xx.xxx.xxx.xx");
        ftp.login("xxxx", "xxxxx");

        ftp.enterLocalPassiveMode();
        ftp.setControlEncoding("UTF-8");
        ftp.setFileType(FTP.BINARY_FILE_TYPE);
        ...
        String remoteFile1 = ftp.printWorkingDirectory() + "/" + file.getName();
        File downloadFile1 = new File(destFolder + "/" + "/" + file.getName());
        OutputStream outputStream1 = new BufferedOutputStream(new FileOutputStream(downloadFile1));
        boolean success = ftp.retrieveFile(remoteFile1, outputStream1);
        outputStream1.flush();
        outputStream1.close();
         ....
        DocumentBuilderFactory docFactory = 
        DocumentBuilderFactory.newInstance();
        DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
        Document doc = docBuilder.newDocument();
        doc = docBuilder.parse(xmlFile);

        doc.getDocumentElement().normalize();
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer trans = tf.newTransformer();
        StringWriter sw = new StringWriter();
        trans.transform(new DOMSource(doc), new StreamResult(sw));
        String xml = sw.toString();
        JSONObject xmlJSONObj = XML.toJSONObject(xml);
        String jsonPrettyPrintString = xmlJSONObj.toString(4);
        jsonMapper.configure(SerializationFeature.WRAP_ROOT_VALUE, false);...

Can someone advise how to ensure the encoding can be changed to output the correct format for foreign characters?

0

There are 0 best solutions below