So I know using this API, some of my xml files which are produced on a server might be generated wrongly or without closing some tags, or badly structured. Now my code is working fine, but for certain files, some of these xml files throw these error, below is block of error
org.xml.sax.SAXParseException; systemId: file:///E:/ARCHIVED_LOGS/BACKUP_LOG_190317_0000/trace_file.xml; lineNumber: 201; columnNumber: 105; XML document structures must start and end within the same entity.
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.endEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.endEntity(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.peekChar(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
So is there a way to process these files without fixing the issue on them using SAX parser API?
I was thinking of processing those files line by line but that is a pain.
Also is there a way similar to skipping dtd validation as seen below
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setValidating(false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
SAXParser saxParser = factory.newSAXParser();
Thank you :)
You can't use a conforming XML parser to process a non-conforming (non-)XML document.
You can use a non-conforming parser - for example an HTML parser - and it might offer the SAX parser API - but whether you can find a parser that accepts the particular flavour of non-XML that is being thrown at you is an open question, since you haven't given us any kind of specification for this non-XML language.