Running two selenium driver instances in parallel, but the both operate on the same chrome window

47 Views Asked by At

I try to run two driver instances (RemoteWebDriver) in parallel using selenium standalone or grid to operate on two different windows/websites at the same time.

Two sessions and two windows are created but the second driver operates on the same window which the first driver does.

What I've been trying:

run selenium standalone:

docker run -p 4444:4444 -p 7900:7900 --shm-size="2g" -e SE_NODE_GRID_URL="http://localhost:4444" -e SE_NODE_MAX_SESSIONS=5 -e SE_NODE_OVERRIDE_MAX_SESSIONS=true -e SE_VNC_NO_PASSWORD=1 selenium/standalone-chrome:4.16.1-20231212
    private fun chromeOptions(
        profileName: String
    ): ChromeOptions {
        return ChromeOptions().apply {
            setCapability(ChromeOptions.CAPABILITY, this)

            addArguments("--allow-insecure-localhost")
            addArguments("--crash-dumps-dir=/tmp")
            addArguments("--remote-debugging-port=9222")

            addArguments("--user-data-dir=/tmp/chrome")
            addArguments("--profile-directory=$profileName")
        }
    }
    suspend fun crawl() {
        val driverBBC = RemoteWebDriver(
            URL("http://localhost:4444/wd/hub"),
            chromeOptions("first")
        )

        val driverNYT = RemoteWebDriver(
            URL("http://localhost:4444/wd/hub"),
            chromeOptions("second")
        )

        try {
            driverBBC.manage().window().maximize()
            driverBBC.get("https://www.bbc.com/")
            println("bbc title: ${driverBBC.title}")

            delay(5000)
            driverNYT.manage().window().maximize()
            driverNYT.get("https://www.nytimes.com/")
            println("bbc title: ${driverBBC.title}")
            println("nyt title: ${driverNYT.title}")

            // delay before closing all
            delay(60000)
        } catch (e: Exception) {
            logger.error("error while scraping", e)
        } finally {
            driverBBC.quit()
            driverNYT.quit()
        }
    }

What expected to happen:

bbc title: BBC - Homepage
bbc title: BBC - Homepage
nyt title: The New York Times - Breaking News, US News, World News and Videos

What is actually resulted:

bbc title: BBC - Homepage
bbc title: The New York Times - Breaking News, US News, World News and Videos
nyt title: The New York Times - Breaking News, US News, World News and Videos
2

There are 2 best solutions below

1
Daniel Perez Efremova On

It seems that you are printing the first driver title in the second print statement. Try the folowing:

try {
            driverBBC.manage().window().maximize()
            driverBBC.get("https://www.bbc.com/")
            println("bbc title: ${driverBBC.title}")

            delay(5000)
            driverNYT.manage().window().maximize()
            driverNYT.get("https://www.nytimes.com/")
            println("bbc title: ${driverNYT.title}") // FIX HERE
            println("nyt title: ${driverNYT.title}")

            // delay before closing all
            delay(60000)
        }

With that point fixed, your code seem to work as expected

0
qwerew On

I've finally found, that's the influence of a following argument:

addArguments("--remote-debugging-port=9222")

After removing it and also removing these params:

addArguments("--user-data-dir=/tmp/chrome")
addArguments("--profile-directory=$profileName")

everything works as expected even in much more complex web scrapings/crawlings :D