I'm trying to collect comments from Yahoo News and having trouble finding the text element of the comments section using Selenium.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
test = "https://www.yahoo.com/news/cdc-advisers-recommend-spring-covid-215003196.html"
# open comments
driver = webdriver.Chrome()
driver.get(test)
comment_button = driver.find_element(By.XPATH, '//a[@class="link caas-button noborder caas-tooltip flickrComment caas-comment top"]')
comment_button.click()
I could click the comments button (so that it opens) and there's no problem up to this part. However, I have trouble finding the text of the comments section.
The red part is an example of what I want to find and here is the condensed html of this part.
<div class="components-MessageContent-index__messageEntitiesWrapper">
<div class="components-MessageContent-components-MessageEntities-MessageEntities__message-entities components-MessageContent-components-MessageEntities-MessageEntities__is-column">
<span class="Typography__text--11-4-15 Typography__t4--11-4-15 Typography__l6--11-4-15">
<div data-spot-im-class="message-text">
<p>...</p>
</div>
</span>
</div>
</div>
I tried with different elements as below, but none of them worked.
# 1
driver.find_element(By.XPATH, '//div[@class="components-MessageContent-index__messageEntitiesWrapper"]')
# 2
driver.find_element(By.XPATH, '//div[@class="components-MessageContent-components-MessageEntities-MessageEntities__message-entities components-MessageContent-components-MessageEntities-MessageEntities__is-column"]')
# 3
driver.find_element(By.XPATH, '//span[@class="Typography__text--11-4-15 Typography__t4--11-4-15 Typography__l6--11-4-15"]')
# 4
driver.find_element(By.XPATH, '//div[@data-spot-im-class="message-text"]')
I also tried
driver.find_elements(By.XPATH, '')
driver.find_element(By.CLASS_NAME, '')
driver.find_elements(By.CLASS_NAME, '')
But they didn't work either. I got an error message saying like this;
NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@class="components-MessageContent-index__messageEntitiesWrapper"]"}
Did I input the Xpath incorrectly, or should I use a different syntax other than Xpath? Or is it basically impossible to scrape the comments section of this website using Selenium? I'd appreciate any help regarding this. Thanks!
The problem is that the comments elements are inside a shadow-root. To access them, you'll need to switch into the shadow-root element and then find the element(s) you want.