How to load a WebView2 document into HtmlAgilityPack

269 Views Asked by At

I can't figure out how to load a WebView2 document into HTML Agility Pack. I'm using JavaScript to get the DOM as a string. However, when I load the DOM string into an HtmlAgilityPack document, every attempt to parse it returns null.

This compiles:

string dom = await webView21.CoreWebView2.ExecuteScriptAsync("document.body.outerHTML"); // Get the DOM with JavaScript
if (dom.Contains("div"))
   System.Diagnostics.Debug.WriteLine("At least one div in the DOM"); // Prints

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml(dom);
var divs = htmlDocument.DocumentNode.SelectNodes("//div");
if (divs == null)
   System.Diagnostics.Debug.WriteLine("divs is null"); // Prints

When I run this snippet, the first if clause confirms that the string dom contains at least one div. However, when the string is loaded into the htmlDocument, the second if clause shows that the variable divs is null. The variable divs should have a count of at least 1. I'm doing something stupid, but I don't know what.

1

There are 1 best solutions below

1
Pan ache On

Getting the DOM with JavaScript leaves unicode characters in the string dom, ie. “\u003C” in place of “<“. After getting the DOM, these can be removed with

dom = System.Text.RegularExpressions.Regex.Unescape(dom);

That answers the question.

As an aside, using "documentElement" instead of "body" gets more of the dom, ie.

string dom = await webView.ExecuteScriptAsync("document.documentElement.outerHTML"); // Get the DOM with JavaScript