I need to automate tasks to extract information from websites using the uBlock plugin with Chrome driver using the selenium module in Python 3.
I am running my code remotely without a GUI on the remote machine - for that I am using xvfb-run to simulate a desktop environment where Chrome launches with a specific window size.
The remote machine has the following Debian operating system:
uname -a
Linux mem 4.19.0-10-amd64 #1 SMP Debian 4.19.132-1 (2020-07-24) x86_64 GNU/Linux
These were the steps I took to configure my environment and code on the remote machine:
1 - To configure my environment, I installed this version of Google Chrome:
google-chrome --version
Google Chrome 86.0.4240.111
2 - Check the versions of Python 3 and selenium I installed:
python --version
Python 3.7.3
pip freeze
selenium==3.141.0
3 - Check the xvfb-run version:
apt-cache policy xvfb
2:1.20.4-1+deb10u1
4 - With these packages configured, I obtained the chromedriver_linux64.zip from this list (the version 86.0.4240.22 below is the most recent one that is the same major version of the installed google-chrome):
https://chromedriver.storage.googleapis.com/index.html
https://chromedriver.storage.googleapis.com/index.html?path=86.0.4240.22/
5 - To be able to use the uBlock extension of Chrome, I needed to install an extension that is able to produce a .crx archive file of other installed extensions. For this, I used CRX Extractor/Downloader:
https://chrome.google.com/webstore/detail/crx-extractordownloader/ajkhmmldknmfjnmeedkbkkojgobmljda
6 - After using that extension, I got my ublock.crx file to test.
I managed to use the binary in chromedriver_linux64.zip without the extension to launch a Chrome instance and do some basic crawling.
But when I tried to use ublock.crx in my code, I got an exception.
The code was this:
This is the exception produced:
ublock.crx error
selenium.common.exceptions.SessionNotCreatedException: Message: session not
created: cannot process extension #1
from unknown error: cannot unzip
I am launching it from my program like this:
from selenium import webdriver
option = webdriver.ChromeOptions()
option.add_extension(ublock_crx_file_path)
driver = webdriver.Chrome(executable_path=driver_path, options=option)
I have made sure the path of ublock_crx_file_path is valid and points to the file I obtained from Chrome.
Hopefully someone can shed light on this?