I'm using NSXMLDocument with the NSXMLDocumentTidyHTML option to parse potentially "untidy" HTML. It has worked well in every scenario that I've tested, unless the string I've passed into NSXMLDocument's -initWithXMLString:options:error: is less than 12 characters.
To demonstrate the problem, consider a trivial example in the following two lines of code:
NSXMLDocument *document = [[NSXMLDocument alloc] initWithXMLString:@"<p>Hello</p>" options:NSXMLDocumentTidyHTML error:NULL];
NSLog(@"%@", [document XMLStringWithOptions:NSXMLNodePrettyPrint]);
This prints the following to the console:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
</head>
<body>
<p>Hello</p>
</body>
</html>
The entire string passed in was 12 characters (<p>Hello</p>), and though this is just a demonstration, the output is as I expect: an html document with an empty title and the paragraph tag inside of a body.
However, remove one character from this string (<p>Helo</p>, for example) and the output drastically changes, as though NSXMLDocumentTidyHTML was not specified as an option:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<p>Helo</p>
I've tested this with many more strings of varying length and varying tags as well (<tr>123</tr> vs. <tr>12</tr>, for example) and experience the same problem. Does anyone have any suggestions on why this would be failing to produce the HTML I would expect when providing a string that is less than 12 characters?