Why is NSXMLDocumentTidyHTML ignored when creating an NSXMLDocument with less than 12 characters?

416 Views Asked by At

I'm using NSXMLDocument with the NSXMLDocumentTidyHTML option to parse potentially "untidy" HTML. It has worked well in every scenario that I've tested, unless the string I've passed into NSXMLDocument's -initWithXMLString:options:error: is less than 12 characters.

To demonstrate the problem, consider a trivial example in the following two lines of code:

NSXMLDocument *document = [[NSXMLDocument alloc] initWithXMLString:@"<p>Hello</p>" options:NSXMLDocumentTidyHTML error:NULL];
NSLog(@"%@", [document XMLStringWithOptions:NSXMLNodePrettyPrint]);

This prints the following to the console:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title></title>
    </head>
    <body>
        <p>Hello</p>
    </body>
</html>

The entire string passed in was 12 characters (<p>Hello</p>), and though this is just a demonstration, the output is as I expect: an html document with an empty title and the paragraph tag inside of a body.

However, remove one character from this string (<p>Helo</p>, for example) and the output drastically changes, as though NSXMLDocumentTidyHTML was not specified as an option:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<p>Helo</p>

I've tested this with many more strings of varying length and varying tags as well (<tr>123</tr> vs. <tr>12</tr>, for example) and experience the same problem. Does anyone have any suggestions on why this would be failing to produce the HTML I would expect when providing a string that is less than 12 characters?

0

There are 0 best solutions below