Why is "requests-html" not rendering all HTML content?

175 Views Asked by At

I am trying to scrape data, but the script is not loading all html content, although I changed the rendering time. Please see the code below:

from requests_html import HTMLSession, AsyncHTMLSession

url = 'https://www.aliexpress.com/w/wholesale-test.html?catId=0&initiative_id=SB_20230516115154&SearchText=test&spm=a2g0o.home.1000002.0'


def create_session(url):
    session = HTMLSession()
    request = session.get(url)
    print("Before   ",len(request.html.html),"\n\n")
    request.html.render(sleep=5,timeout=20) #Because it is dynamic website, will wait until to load the page
    prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR > a:nth-child(1) > div.manhattan--content--1KpBbUi')
    print("After   ",len(request.html.html),"\n\n")
    print("output:",prod)
    session.close()

create_session(url)

When I ran the code for the first time, the output was:

Before  55448

After   542927

output: [<Element 'div' class=('manhattan--content--1KpBbUi',)>]

when I run the program again (WITHOUT changing anything in the code) I got:

Before  55448  
 
After   251734   

output: []

and when I changed the sleep time from 5 to 100: request.html.render(sleep=5,timeout=20) to request.html.render(sleep=100,timeout=20), I also received a similar output:

Before  55448   

After   242881   

output: []

It is not rendering all html content

0

There are 0 best solutions below