I'm trying to scrape data from IMDB for some analysis. I'm new to Python and Selenium.
What I hope to achieve is to let Selenium click on the load "50 more" button at the bottom of the page until all the data is loaded properly. Right now it doesn't do anything.
This is the URL I wish to do it on:
This is my current code:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://www.imdb.com/search/title/?title_type=tv_series&release_date=2016-01-01,2024-12-01&sort=release_date,desc&countries=KR"
driver = webdriver.Chrome()
driver.get(url)
try:
while True:
try:
# Wait for the "Load More" button to be clickable
more_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.XPATH, '//button[contains(@class, "ipc-see-more__button")]'))
)
# Click on the "Load More" button
more_button.click()
# Wait for some time to allow the page to load more data
driver.implicitly_wait(5)
except TimeoutException:
# If the "Load More" button is not found, break out of the loop
break
finally:
# Close the webdriver
driver.quit()
I'm not sure if the XPATH and classes are correct but I have tried multiple variations with no success.
I used the following class name before:
more_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.CLASS_NAME, "ipc-see-more__text"))
)
I'm expecting to make Selenium click on the load 50 More button until it can't find any more to press. Hence making all the data available in 1 page.
You have a number of total movies on the screen, so you can use it to figure out how many iterations you need.
Then you can start a loop with X iterations, where X is the total number of movies, and collect the data from visible elements while expanding 50 more records until all records are collected.
The following code works for me:
However, I'd recommend you learn about GraphQL in order to use requests for your scrapping (it will be much faster than using UI).