Parse a dynamic HTML Page using PhantomJS and Python

109 Views Asked by At

I would like to scrape an HTML page where content is not static but loaded with javascript.

I downgrade Selenium to version 3.3.0 in order to be able to support PhantomJS (v4.9.x does not support PhantomJS anymore) and wrote this code:

from selenium import webdriver
driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
p_element = driver.find_element_by_id(id_='my-id')
print(p_element)

The error I'm getting is:

selenium.common.exceptions.NoSuchElementException: Message: "errorMessage":"Unable to find element with id 'my-id'"

The element I want to return is tag <section> with a certain id and all its subtags. The HTML content is like that:

<section id="my-id" class="my-class">...</section>
2

There are 2 best solutions below

0
undetected Selenium On BEST ANSWER

This error message...

selenium.common.exceptions.NoSuchElementException: Message: "errorMessage":"Unable to find element with id 'my-id'

...implies that the element wasn't found within the HTML DOM.

The possible reason is that the desired WebElement didn't render within the Viewport as by default initializes with a minimized viewport.


Solution

You need to initialize PhantomJS with the maximized viewport inducing WebDriverWait for the visibility_of_element_located() while locating it as follows:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
driver.maximize_window()
p_element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.ID, "my-id")))
print(p_element)
0
Aymen Krifa On

This could be due to various reasons, such as the element not being present at the time the code is executed or the element having a different ID, but in case you double-checked the ID presence. I think you have to make sure that the page has finished loading before attempting to find the element. In certain cases, JS-based content may take a bit longer to load. You can add delays or an explicit wait to ensure that the element is available before accessing it

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

driver = webdriver.PhantomJS('path-to-phantomJS')
driver.get('my_url')
delay = 10  # Wait up to 10 seconds for the element to be present

try:
    wait = WebDriverWait(driver, delay)
    p_element = wait.until(EC.presence_of_element_located((By.ID, 'my-id')))
    print(p_element.text)
except TimeoutException:
    print("Timeout!")

Hope this helps!