Selenium cannot find the ID of reddit comments, why?

26 Views Asked by At

I've been using selenium to take screenshots of Reddit posts and comments, and I've run into an issue that I can't find a fix for online. My code gives selenium the ID of the object I want to take a screenshot of, and with the main reddit post itself, this works great. When it comes to the comment though, it always times out (when using EC.presence_of_element_located()) or says that it can't find it (when using Driver.findElement()).

Here's the code:

def getScreenshotOfPost(header, ID, url):
    driver = webdriver.Chrome() #Using chrome to define a web driver
    driver.get(url) #Plugs the reddit url into the web driver
    driver.set_window_size(width=400, height=1600)
    wait = WebDriverWait(driver, 30)
    driver.execute_script("window.focus();")
    method = By.ID #ID is what I've found to be the most reliable method of look-up
    handle = f"{header}{ID}" #The header will be of the form "t3_" for posts and "t1_" for comments, and the ID is the ID of the post of comment.

    element = wait.until(EC.presence_of_element_located((method, handle)))
    driver.execute_script("window.focus();")

    fp = open(f'Post_{header}{ID}.png', "wb")
    fp.write(element.screenshot_as_png)
    fp.close()

I've tried searching by ID, CLASS, CSS_SELECTOR, and XPATH, and none of them work. I've double checked and the form t1_{the id of the comment} is the correct ID for the comment, regardless of the reddit post. Increasing the wait-time on my web driver doesn't work. I'm not sure what the issue would be.

Thanks in advance for any help!

1

There are 1 best solutions below

1
JeffC On BEST ANSWER

I see what the problem is... there are a TON of nested shadow-roots on the page. If you are familiar with IFRAMEs, they behave similarly. Basically you need to switch Selenium's context into the IFRAME/shadow-root for Selenium to be able to see the DOM inside and proceed. You will have to switch into each shadow-root, one at a time, and keep diving until you get to the element you want.

Some example code,

def test_recommended_code():
    driver = Chrome()

    driver.get('http://watir.com/examples/shadow_dom.html')

    shadow_host = driver.find_element(By.CSS_SELECTOR, '#shadow_host')
    shadow_root = shadow_host.shadow_root
    shadow_content = shadow_root.find_element(By.CSS_SELECTOR, '#shadow_content')

    assert shadow_content.text == 'some text'

    driver.quit()

You can read more about it in this article.