How to write Selenium to press the load "50 More" button in IMDB until all data is shown

82 Views Asked by At

I'm trying to scrape data from IMDB for some analysis. I'm new to Python and Selenium.

What I hope to achieve is to let Selenium click on the load "50 more" button at the bottom of the page until all the data is loaded properly. Right now it doesn't do anything.

This is the URL I wish to do it on:

https://www.imdb.com/search/title/?title_type=tv_series&release_date=2016-01-01,2024-12-01&sort=release_date,desc&countries=KR

This is my current code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://www.imdb.com/search/title/?title_type=tv_series&release_date=2016-01-01,2024-12-01&sort=release_date,desc&countries=KR"

driver = webdriver.Chrome()
driver.get(url)

try:
    while True:
        try:
            # Wait for the "Load More" button to be clickable
            more_button = WebDriverWait(driver, 10).until(
            EC.element_to_be_clickable((By.XPATH, '//button[contains(@class, "ipc-see-more__button")]'))
            )

            # Click on the "Load More" button
            more_button.click()

            # Wait for some time to allow the page to load more data
            driver.implicitly_wait(5)

        except TimeoutException:
            # If the "Load More" button is not found, break out of the loop
            break

finally:
    # Close the webdriver
    driver.quit()

I'm not sure if the XPATH and classes are correct but I have tried multiple variations with no success.

I used the following class name before:

more_button = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.CLASS_NAME, "ipc-see-more__text"))
)

I'm expecting to make Selenium click on the load 50 More button until it can't find any more to press. Hence making all the data available in 1 page.

1

There are 1 best solutions below

0
sashkins On

You have a number of total movies on the screen, so you can use it to figure out how many iterations you need.

Then you can start a loop with X iterations, where X is the total number of movies, and collect the data from visible elements while expanding 50 more records until all records are collected.

The following code works for me:

from selenium import webdriver
from selenium.common import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

driver = webdriver.Chrome()
driver.maximize_window()
wait = WebDriverWait(driver, 10)
short_wait = WebDriverWait(driver, 5)

try:
    driver.get(
        "https://www.imdb.com/search/title/?title_type=tv_series&release_date=2016-01-01,2024-12-01&sort=release_date,desc&countries=KR"
    )

    # get number of movies
    num_line = wait.until(EC.visibility_of_element_located(
        (By.XPATH, "//div[contains(@class, 'ipc-page-grid__item')]/div[contains(text(), ' of ')]"))
    ).text.split("of")[1].replace(",", "").strip()
    num_str = ""
    for char in num_line:
        if not char.isdigit():
            break
        num_str += char
    total_num = int(num_str)

    # collect movies
    iteration = 0
    last_movie_index = 0
    collected_movies = []
    while len(collected_movies) != total_num:
        if iteration > (total_num / 50) + 1:
            raise RuntimeError("Too many iterations")

        displayed_movies = wait.until(EC.visibility_of_all_elements_located(
            (By.XPATH, "//li[contains(@class, 'ipc-metadata-list-summary-item')]")
        ))
        for i in range(last_movie_index, len(displayed_movies)):
            el = wait.until(
                EC.visibility_of_element_located(
                    (By.XPATH, f"(//li[contains(@class, 'ipc-metadata-list-summary-item')])[{i + 1}]")
                )
            )

            # your parsing logic goes here
            # your parsing logic goes here
            # your parsing logic goes here

            name = el.find_element(By.XPATH, './/h3').text
            collected_movies.append(name)
            print("Collected:", name)

        # click show more (if needed)
        last_movie_index = len(collected_movies)
        if last_movie_index != total_num:
            try:
                show_more_btn = wait.until(EC.element_to_be_clickable(
                    (By.XPATH, f"//button[contains(@class, 'ipc-see-more__button')]")
                ))
                driver.execute_script("arguments[0].click();", show_more_btn)
                wait.until(EC.invisibility_of_element_located(
                    (By.XPATH, f"//button[contains(@class, 'ipc-see-more__button') and @disabled]")
                ))
                driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
            except TimeoutException:
                pass

        iteration += 1
finally:
    driver.quit()

However, I'd recommend you learn about GraphQL in order to use requests for your scrapping (it will be much faster than using UI).