Java HtmlUnit No Longer Able To Find List<HtmlFieldSet> Using getByXPath(@class=)

28 Views Asked by At

If you go to https://parcelinquirytreasurer.cochise.az.gov/ , type 1010501508 in the text field and click on Submit, you will get a new web page (Page 2). There is a mailing address on Page 2. If you inspect it, you will see that it starts with fieldset class="addressblock". You will get that whether your are using Chrome on Windows 11 or Firefox on Ubuntu 24.04 Daily Build.

I was able to reproduce that on Ubuntu 24.04 Daily Build using Java (Oracle JDK 21.0.2) and two one year old versions of HtmlUnit (Versions 02.69.00 and 02.70.00) that I downloaded from jar-download.com.

I used the following code:

import java.util.*;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftwaer.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.javascript.*;

class HtmlFieldSetClass {
    public static void main(String[] args) {
        try {
            System.getProperties().put("org.apache.commons.logging.simplelog.defaultlog", "fatal");
            java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
            final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
            webClient.getOptions().setCssEnabled(false);
            webClient.getOptions().setThrowExceptionOnScriptError(false); 
            webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener()); 
            webClient.setCssErrorHandler(new SilentCssErrorHandler());  
            HtmlPage page = webClient.getPage("https://parcelinquirytreasurer.cochise.az.gov/");
            webClient.waitForBackgroundJavaScriptStartingBefore(1000000000);    
            page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage(); 
            List<HtmlForm> forms = page.getByXPath("//form[@method='post']");
            Iterator formsIterator = forms.iterator();
            HtmlForm form = (HtmlForm) formsIterator.next();
            List<HtmlInput> his = form.getByXPath("//input[@value='Submit']");
            Iterator hisIterator = his.iterator();
            HtmlInput hi = (HtmlInput) hisIterator.next();
            HtmlTextInput textField = form.getInputByName("parcelNumber_input");  
            textField.click();
            textField.setValueAttribute("1010501508");  
            webClient.waitForBackgroundJavaScript(10000000);
            webClient.waitForBackgroundJavaScriptStartingBefore(1000000);
            webClient.getOptions().setJavaScriptEnabled(true);
            HtmlPage page2 = hi.click();
            page2.getEnclosingWindow().getJobManager().waitForJobs(10000000);
            webClient.waitForBackgroundJavaScriptStartingBefore(1000000000);
            List<HtmlFieldSet> addressblocks = page2.getByXPath("//fieldset[@class='addressblock']");
            System.out.println("addressblocks.size() = " + addressblocks.size());
        }
        catch (Exception e) {
            System.out.println("Exception:  " + e.toString());
        }
    }
}

I got an output of: addressblocks.size() = 1

I tried to reproduce the same thing using the latest version of HtmlUnit and the one before it (Versions 03.09.0 and 03.10.0), both downloaded from htmlunit.sourceforge.io. The only thing I changed in the code is the following: I replaced all com.gargoylesoftware. with org.

I was expecting to see the same output: addressblocks.size() = 1

Instead, I got the following output: addressblocks.size() = 0

It seems like either of the following happened:

  1. A bug was introduced in HtmlUnit somewhere between the 2.70.0 and the 3.09.0 release.
  2. The protocol for using HtmlUnit to get fieldset class="something" has changed from release 2.70.0 to release 3.10.0

If it's the latter, please tell me how to change my code to get it to work with HtmlUnit 3.10.0.

If it's the former, please let me know so that I will report it to HtmlUnit developers at https://htmlunit.sourceforge.io/submittingBugs.html

Right now, the best workaround I can think of is to find out the latest version of HtmlUnit for which the above code works, and revert to using that one. If you can think of a workaround that can enable me to continue using the latest version of HtmlUnit while still getting into the address block of the above mentioned second web page, please let me know.

Thank you in advance,

GodsGiftToJava

1

There are 1 best solutions below

0
RBRi On

HtmlUnit 3.0 is a major release - some things are not compatible.

The changes report (https://www.htmlunit.org/changes-report.html#a3.0.0) mentions all.

enter image description here

You have to adapt your code regarding the providing o the value for the search field - replace setValueAttribute with type. Because of this you dont reach the expected page and therefore the xpath does not found your element.

This slightly simplified code works here

final String url = "https://parcelinquirytreasurer.cochise.az.gov/";

try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
    webClient.getOptions().setCssEnabled(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);

    HtmlPage page = webClient.getPage("https://parcelinquirytreasurer.cochise.az.gov/");
    webClient.waitForBackgroundJavaScriptStartingBefore(10_000);
    page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();

    List<HtmlForm> forms = page.getByXPath("//form[@method='post']");
    Iterator<HtmlForm> formsIterator = forms.iterator();
    HtmlForm form = formsIterator.next();

    List<HtmlInput> his = form.getByXPath("//input[@value='Submit']");
    Iterator<HtmlInput> hisIterator = his.iterator();
    HtmlInput hi = hisIterator.next();

    HtmlTextInput textField = form.getInputByName("parcelNumber_input");
    textField.type("1010501508");

    textField.click();
    textField.setValueAttribute("1010501508");

    HtmlPage page2 = hi.click();
    webClient.waitForBackgroundJavaScriptStartingBefore(10_000);

    List<HtmlFieldSet> addressblocks = page2.getByXPath("//fieldset[@class='addressblock']");
    System.out.println("-------------------------------------------------------------------------------");
    System.out.println(addressblocks.iterator().next().asNormalizedText());
    System.out.println("-------------------------------------------------------------------------------");
}

Hope it works for you also...