I am trying to scrape a website and download all the webpages as .html files (including all the HTML assets) so that the locally downloaded page opens just like the same in the server.
Currently using Selenium, Chrome Webdriver, and Python.
Approach:
I tried updating the prefs of the chrome browser. And then login into the website. After logging in I want to download the webpage similarly we do download by clicking ctrl + s from the keyboard.
Below code opens the desired page I want to download but does not disable Windows's save as a pop-up and neither downloads the page to the specified path.
from selenium import webdriver
import pyautogui
chrome_options = webdriver.ChromeOptions()
preferences = {
"download.default_directory":"C:\\Users\\pathtodir",
"download.prompt_for_download": False,
"download.directory_upgrade": True,
"safebrowsing.enabled": True
}
chrome_options.add_experimental_option("prefs", preferences)
driver = webdriver.Chrome(options=chrome_options)
driver.get(***URL to the website***)
driver.find_element("xpath", '//*[@id="id_username"]').send_keys('username')
driver.find_element("xpath", '//*[@id="id_password"]').send_keys('password')
driver.find_element("xpath", '//*[@id="datagrid-0"]/div[2]/div[1]/div[1]/table/tbody/tr[1]/td[2]/a').click()
pyautogui.hotkey('ctrl', 's')
pyautogui.typewrite('hello1' + '.html')
pyautogui.hotkey('enter')
Can somebody please help me to understand what I am doing wrong? Please suggest if there is any other alternative library that can be used in python.
To save a page first obtain the page source behind the webpage with the help of the
page_sourcemethod.Then open a file with a particular encoding with the
codecs.openmethod. The file has to be opened in the write mode represented by w and encoding type as utf−8. Then use the write method to write the content obtained from the page_source method.