How to correctly manage browser instances/contexts in puppeteer in between renders to avoid persistent tcp connections?

112 Views Asked by At

Currently in my project we are spawning 4 browser instances each with it's own incognito contexts as the production environment and we are reusing them by taking, using it for a render and placing them back in a pool. Is it better to close the context after each render and return a fresh new context to the pool every time or is this too much of a performance hit ? Or otherwise what is a good way to handle multiple browser instances/contexts ?

// factory method
public async create (): Promise<BrowserContext> {
            const browser = await launch(processedOptions);
            // when a browser gets launched, it automatically creates a default BrowserContext (new Tab on the default Browser)
            // originally that tab was closed manually since we are creating a new incognito browser context to use, however with the upgrade to puppeteer v20+
            // uncommenting the next line gets the Docker container stuck at creating a new page from the incognitoBrowserContext.
            // await (await browser.pages())[0].close();
            return browser.createIncognitoBrowserContext();
}

We run the code above 4 times to create these instances in production and I am not entirely sure this is the right approach, I would appreciate any input here.

Furthermore, when launching the browser with puppeteer, I do not need the default browser context as I am immediately creating an incognito browser context and using that one instead, however the default browser context is always there in the background and I am wondering if that can be closed (how ?) and just keep using the incognito one (some more info in the comments in code).

Another reason for asking is because we are seeing some tcp connections to google that remain active even after closing the incognito browser context page that was used to render and in production we are not stopping the server so the default browser context is always there in the background basically for all 4 browser instances.

puppeteer-core version: 20.7.4

Chrome for testing: 114.0.5735.133 (Official Build) - it's the version associated with the puppeteer version.

As mentioned in the commented code from above

// await (await browser.pages())[0].close();

I tried to close the default browser context page, but this does not work when it's deployed through Docker and I am quite sure this would not solve the issue of persistent google api tcp connections.

This has become a problem with a particular client who persists many connections to tcp to google services and it might cause problems with our hosting service and it has in the past.

Another thing we have tried is to completely close the browser and instantiate a new one with each render, but that is too slow performance wise and can increase the costs a lot when a domain has 1000s or 10000s of pages.

0

There are 0 best solutions below