Cannot scrape AliExpress HTML element

640 Views Asked by At

I would like to scrape an arbitrary offer from aliexpress. Im trying to use scrapy and selenium. The issue I face is that when I use chrome and do right click > inspect on a element I see the real HTML but when I do right click > view source I see something different - a HTML CSS and JS mess all around.

As far as I understand the content is pulled asynchronously? I guess this is the reason why I cant find the element I am looking for on the page.

I was trying to use selenium to load the page first and then get the content I want but failed. I'm trying to scroll down to get to reviews section and get its content

Is this some advanced anti-bot solution that they have or maybe my approach is wrong?

The code that I currently have:

import scrapy
from selenium import webdriver
import logging
import time

logging.getLogger('scrapy').setLevel(logging.WARNING)


class MySpider(scrapy.Spider):
    name = 'myspider'
    
    start_urls = ['https://pl.aliexpress.com/item/32998115046.html']

    def __init__(self):
        self.driver = webdriver.Chrome()

    def parse(self, response):
        self.driver.get(response.url)

        scroll_retries = 20
        data = ''
        while scroll_retries > 0:
            try:
                data = self.driver.find_element_by_class_name('feedback-list-wrap')
                scroll_retries = 0
            except:
                self.scroll_down(500)
                scroll_retries -= 1

        print("----------")
        print(data)
        print("----------")
        self.driver.close()

    def scroll_down(self, pixels):
        self.driver.execute_script("window.scrollTo(0, {});".format(pixels))
        time.sleep(2)
1

There are 1 best solutions below

0
Moein Kameli On BEST ANSWER

By watching requests in network tab in inspect tool of browser you will find out comments are comming from here so you can crawl this page instead.