How to use htmlunit to record all requests fired during rendering a page?

167 Views Asked by At

I'm using HTMLUnit that trying to record all requests fired when loading a local html file. This is the testing file below:

<script type="text/javascript">
  !(function () {
    var adc = function (str) {
      return decodeURIComponent(escape(window.atob(str)));
    };
    document.write(
      adc(
        "PGEgaHJlZj0iaHR0cHM6Ly9kc3A4dTRqaGE0NHJyLmNsb3VkZnJvbnQubmV0L2RpcmVjdC8xOTgyMzQxMDE/YWR4PUFsZ29yaXgoUHJvKSZhcHA9MjgxOTQwMjkyJnByaWNlPTAuOTEwMSZyZD1MVldrTUJRTUxGQzhrIiB0YXJnZXQ9Il9ibGFuayI+PGltZyBzcmM9Imh0dHBzOi8vZHNwOHU0amhhNDRyci5jbG91ZGZyb250Lm5ldC9pbXAvMTk4MjM0MTAxP2FkeD1BbGdvcml4KFBybykmYXBwPTI4MTk0MDI5MiZwcmljZT0wLjkxMDEmcmQ9TFZXa01CUU1MRkM4ayIgd2lkdGg9IjMyMCIgaGVpZ2h0PSI1MCI+PC9hPjxpbWcgc3JjPSJodHRwczovL2QybWsybmg4dmZmNzY4LmNsb3VkZnJvbnQubmV0L3YxL3BpeGVsP2E9MTAyOSZiPTEwNDYmYz0xJmQ9ZmZkNDkyNTFjYjkwOGI5NSZlPTU4Njg0YWYzN2Q0MTFhNGImZj0wLjkxMDEmZz0wLjkxMDE0Jmg9MTA0NSZpPWZiMTJmZjY5ZjJkZmFiZjAmaz04MzM1NTEzMTAxMDI5ODI2NjAmcmQ9TFZXa01CUU1MRkM4ayIgYm9yZGVyPSIwIiB3aWR0aD0iMSIgaGVpZ2h0PSIxIi8+"
      )
    );
    document.write(
      adc(
        "PGltZyBzcmM9Imh0dHBzOi8vdXNlLnRyay5zdnItYWxnb3JpeC5jb20vaW1wP2NycHY9MyZpbmZvPTlFbVpwWkNNdWdETnVFak14NHlNeTBEY3BWbkp5a2pNd1FUT3hnak05UVdkaVpDTTlRSGR5Tm5KeDBEZDBsbVltRVRQdEJuWW1Bek45STNjeU5uSngwVGJtQm5KdzBEYzRWbUp3MFRhd0ZtSngwVGUwRm1KelFUTndZVFBrbDJjbWdUTTVJak41RURPMkVUUDBKbkp4a2pMdzBUYmhaU000TWpOdUFUUHRaU013RVRPdUFUUHRKbUp3a3pNOWtuWW1FMFVWMXpZbUV6TTJRek54MERjbVEyTmhKV04xUWpNelUyTTNnell3Z1RZalZHTndVR09rSlRZNVVUTW1oVFo5RW5jJnByaWNlPSR7QVVDVElPTl9QUklDRX0mcz02MDU0MyZyPWU4ZjE1OWEyZDhlMDRlY2E4MGM4NzNlMzI0NTViYTdkIiB3aWR0aD0iMSIgaGVpZ2h0PSIxIiBzdHlsZT0iZGlzcGxheTpub25lOyI+PGRpdiBpZD0iZG9qczIwMTJiMDVhIiBkYXRhLXdpZHRoPSIzMjAiIGRhdGEtaGVpZ2h0PSI1MCIgZGF0YS10cms9J2h0dHBzOi8vdXNlLnRyay5zdnItYWxnb3JpeC5jb20vaW1wP2NycHY9MyZpbmZvPTlFbVpwWkNNdWdETnVFak14NHlNeTBEY3BWbkp5a2pNd1FUT3hnak05UVdkaVpDTTlRSGR5Tm5KeDBEZDBsbVltRVRQdEJuWW1Bek45STNjeU5uSngwVGJtQm5KdzBEYzRWbUp3MFRhd0ZtSngwVGUwRm1KelFUTndZVFBrbDJjbWdUTTVJak41RURPMkVUUDBKbkp4a2pMdzBUYmhaU000TWpOdUFUUHRaU013RVRPdUFUUHRKbUp3a3pNOWtuWW1FMFVWMXpZbUV6TTJRek54MERjbVEyTmhKV04xUWpNelUyTTNnell3Z1RZalZHTndVR09rSlRZNVVUTW1oVFo5RW5jJnByaWNlPSR7QVVDVElPTl9QUklDRX0mcz02MDU0MyZyPWU4ZjE1OWEyZDhlMDRlY2E4MGM4NzNlMzI0NTViYTdkJyBkYXRhLWlkPSdBbGdvcmlYLWU4ZjE1OWEyZDhlMDRlY2E4MGM4NzNlMzI0NTViYTdkJz48c2NyaXB0IHR5cGU9J3RleHQvamF2YXNjcmlwdCcgYXN5bmMgc3JjPSJodHRwczovL3Ryay5zdnItYWxnb3JpeC5jb20vc3RhdGljL200LmpzP3Q9OTM0NDIzIj48L3NjcmlwdD48L2Rpdj4="
      ).replace(new RegExp(adc("XCR7QVVDVElPTl9QUklDRX0="), "g"), "0.6381")
    );
  })();
</script>
<img
  src="https://use.trk.svr-algorix.com/win?crpv=3&info=9EmZpZCMugDNuEjMx4yMy0DcpVnJykjMwQTOxgjM9QWdiZCM9QHdyNnJx0Dd0lmYmETPtBnYmAzN9I3cyNnJx0TbmBnJw0Dc4VmJw0TawFmJx0Te0FmJzQTNwYTPkl2cmgTM5IjN5EDO2ETP0JnJxkjLw0TbhZSM4MjNuATPtZSMwETOuATPtJmJwkzM9knYmE0UV1zYmEzM2QzNx0DcmQ2NhJWN1QjMzU2M3gzYwgTYjVGNwUGOkJTY5UTMmhTZ9Enc&price=0.6381&s=60543&r=e8f159a2d8e04eca80c873e32455ba7d"
  width="1"
  height="1"
  style="display: none"
/>

When rendering it in Chrome, Network tab showing the url tracking list: enter image description here

Including the local file itself, there are 7 requests fired. This is what I expect to see in my codes print result.

My codes below:

public class RenderHTML extends WebConnectionWrapper {

    static List<String> list = new ArrayList<String>();


    public RenderHTML(WebClient webClient) throws IllegalArgumentException {
        super(webClient);
    }

    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        // Log the URL of the request
        System.out.println(request.getUrl().toString());
        return super.getResponse(request);
    }



    public static void main(String[] args) throws IOException {
        try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
            // Wrap the client with the URLRecorder
            webClient.getOptions().setJavaScriptEnabled(true);
            webClient.waitForBackgroundJavaScriptStartingBefore(100_000);
            webClient.waitForBackgroundJavaScript(100_000);
            webClient.getOptions().setCssEnabled(true);
            webClient.getOptions().setRedirectEnabled(true);
            webClient.getOptions().setUseInsecureSSL(false);
            webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
            webClient.getOptions().setThrowExceptionOnScriptError(false);
            webClient.getCookieManager().setCookiesEnabled(true);
            webClient.setAjaxController(new AjaxController());
            webClient.getCookieManager().setCookiesEnabled(true);

            webClient.setWebConnection(new RenderHTML( webClient));

            // Load the local HTML file
            HtmlPage page = webClient.getPage("file:///Users/derrickguo/work/project/project_java/analyze_demand_tool_maven/src/main/lib/algorix_us_adm.html");
            
        }
    }
}

But it only printing: enter image description here

one fired request then process finished.

Anyone could give me a hand on how to get all fired requests? Thank you very much!

1

There are 1 best solutions below

0
RBRi On

HtmlUnit is a headless browser - images are not downloaded per default. But you can switch this on

webClient.getOptions().setDownloadImages(true);

Dids some test with version 3.1.0 and I was able to see all the requests.

Please keep in mind the methods waitForBackgroundJavaScriptStartingBefore() and waitForBackgroundJavaScript() are not options. You have to call them after getting a page or after a click (but this was not needed in your case).

My test code:

public class Issue76084456 extends WebConnectionWrapper {

    public Issue76084456(WebClient webClient) throws IllegalArgumentException {
        super(webClient);
    }

    @Override
    public WebResponse getResponse(WebRequest request) throws IOException {
        // Log the URL of the request
        System.out.println("#######" + request.getUrl().toString());
        return super.getResponse(request);
    }

    public static void main(String[] args) throws IOException {
        try (WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
            // Wrap the client with the URLRecorder
            webClient.setWebConnection(new Issue76084456(webClient));

            webClient.getOptions().setDownloadImages(true);

            // Load the local HTML file
            HtmlPage page = webClient.getPage("file:///C:/RBRi/htmlunit/algorix_us_adm.html");
        }
    }
}