Stop Jsoup document toString method from outputting nodes was not in the input string passed to the html parser

27 Views Asked by At

We use jsoup html parser to parse a malformed document, such as <div><span>span 1</span><span>span 2</span>

Note that the html document is malformed, missing the trailing </div>.

java code
String destString = "<div><span>span 1</span><span>span 2</span>";
org.jsoup.nodes.Document jDoc = Jsoup.parse(destString, "", Parser.xmlParser());
System.out.println("Parse:" + jDoc.toString());

The output will be is outcome: <div><span>span 1</span><span>span 2</span></div>

Our use case need to preserve the content, unless it is changed by our conversion logic. The addition of trailing "</div>" is considered a bug. How can I override the above behavior

I expect jsoup Document.toString() will not include node that do not exist in the string passed to the html parser.

0

There are 0 best solutions below