I came across a case whereby for some reason I cannot get a page source after JavaScript is executed:
#!/usr/bin/python
from selenium import webdriver
import time
driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true',
'--ssl-protocol=any'])
driver.set_window_size(1124, 850)
driver.get('https://semanticscholar.org/search?q=The+iterative+deepening+A*')
time.sleep(20)
print driver.page_source.encode('utf-8')
I used to have a waiting strategy in my code, but have switched to a simple sleep for this minimal example.
Is there something special about the page whose source I am trying to read?
EDIT: Interestingly, it tried using headless Chrome instead of PhantomJS and it worked! Here is the code:
#!/usr/bin/python
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
import time
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.binary_location = '/usr/bin/google-chrome'
driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"), chrome_options=chrome_options)
driver.set_window_size(1124, 850)
driver.get('https://semanticscholar.org/search?q=The+iterative+deepening+A*')
time.sleep(20)
print driver.page_source.encode('utf-8')
As per your you question details here are my observations :
Headless Chrome:
Code Block:
Console Output:
PhantomJS:
Code Block:
Console Output:
Conclusion
Though there is some difference between the Page Source returned through ChromeDriver and PhantomJSDriver but both the WebDriver variants provides the relevant Page Source.