Web scraping with Python-Windmill (How to accurately wait till a page fully loads)

1.3k Views Asked by At
  1. I have been playing around with windmill to try out some web scraping, however the API waits.forPageLoad is not able to check if the page is fully rendered.

  2. And in a scenario where I need to reload a page with an existing DOM and I use waits.forElement to detect the DOM for the script to "decide" that the page has loaded. This would occasionally detect the DOM even before the page has loaded.

  3. Also loading a page with windmill test client in firefox seems to take forever. The same page if I load with my regular firefox browser may take like 2 seconds but may take up to a minute in the test client. Is it normal for it to take so long?

  4. Lastly I was wondering if there are better alternatives to windmill for webscraping? The documentation seems abit sparse.

Please advice. Thanks :P

1

There are 1 best solutions below

0
TangibleDream On
 client.waits.sleep(milliseconds=u'2000')

an absolute pause of 2 seconds.

 client.waits.forPageLoad(timeout=u'20000')

Will wait on future lines until the page loads or until 20 seconds have elapsed, which ever comer first. Think of it as a time bordered assert. If the page loads in under 20 seconds pass, if not fail.

I hope this helps,

TD