I am trying to scrape data, but the script is not loading all html content, although I changed the rendering time. Please see the code below:
from requests_html import HTMLSession, AsyncHTMLSession
url = 'https://www.aliexpress.com/w/wholesale-test.html?catId=0&initiative_id=SB_20230516115154&SearchText=test&spm=a2g0o.home.1000002.0'
def create_session(url):
session = HTMLSession()
request = session.get(url)
print("Before ",len(request.html.html),"\n\n")
request.html.render(sleep=5,timeout=20) #Because it is dynamic website, will wait until to load the page
prod = request.html.find('#root > div > div > div.right--container--1WU9aL4.right--hasPadding--52H__oG > div > div.content--container--2dDeH1y > div.list--gallery--34TropR > a:nth-child(1) > div.manhattan--content--1KpBbUi')
print("After ",len(request.html.html),"\n\n")
print("output:",prod)
session.close()
create_session(url)
When I ran the code for the first time, the output was:
Before 55448
After 542927
output: [<Element 'div' class=('manhattan--content--1KpBbUi',)>]
when I run the program again (WITHOUT changing anything in the code) I got:
Before 55448
After 251734
output: []
and when I changed the sleep time from 5 to 100: request.html.render(sleep=5,timeout=20) to request.html.render(sleep=100,timeout=20), I also received a similar output:
Before 55448
After 242881
output: []
It is not rendering all html content