I'm parsing this HTML file
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
<figure>
<img src="content/test.svg" alt="">
<figcaption>Test caption.</figcaption>
</figure>
</body>
</html>
with PowerShell 5. While the below approach works well for all relevant tags, including but not limited to div, p, table, td, tr, ... I seem to not figure out where the "Test caption." text is located in the object.
$html = New-Object -Com "HTMLFile";
$html.IHTMLDocument2_write($htmlContent);
$allTags = $html.all;
$allTags[8].tagName # is FIGURE
$allTags[9].tagName # is /FIGURE
But $allTags[8].outerHTML contains only <FIGCAPTION>. $allTags[9].outerHTML contains only </FIGCAPTION>. innerHTML is empty.
How can $html.documentElement.outerHTML still contain that figcaption text?
Also this w3schools example indicates that it should work like that. What am I missing? Thanks.
It's a compatibility issue.
<figcaption>requires IE9+. Even if you have the latest IE version installed, the IE COM object might still choose to parse the HTML in compatiblity mode, which happens here.Insert the
X-UA-Compatiblemeta tag to force the IE COM object to use the latest IE version:More info: Towards Internet Explorer 11 Compatibility