I am currently working on a project which requires puppeteer package with nodejs.
I first used puppeteer package on my app, and after few difficulties I managed to make it work on my server by dockerizing my app.
Here is my dockerfile :
FROM ghcr.io/puppeteer/puppeteer:22.0.0
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true \
PUPPETEER_EXECUTABLE_PATH=/usr/bin/google-chrome-stable
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm ci
COPY . .
CMD [ "node", "server.js"]
However, the process takes too much time in my server, so I wanted to scale my app by browsing through all my pages in parallel using puppeteer-cluster : https://github.com/thomasdondorf/puppeteer-cluster
I followed the instructions and everything works fine when running locally, but it just doesn't work on my deployed app in Render.com.
Here is how I call puppeteer-cluster :
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_PAGE,
puppeteerOptions: {
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
]
},
maxConcurrency: 20,
skipDuplicateUrls: true,
});
I am getting these two errors and sometimes it doesn't show an error but my pages were not loaded because I don't get the results :
/usr/src/app/node_modules/puppeteer-cluster/dist/Worker.js:41
throw new Error('Unable to get browser page');
^
Error: Unable to get browser page
at Worker.<anonymous> (/usr/src/app/node_modules/puppeteer-cluster/dist/Worker.js:41:31)
at Generator.next (<anonymous>)
at fulfilled (/usr/src/app/node_modules/puppeteer-cluster/dist/Worker.js:5:58)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Node.js v20.9.0
Requesting main frame too early!
I tried running my app with debug mode : https://github.com/thomasdondorf/puppeteer-cluster?tab=readme-ov-file#debugging
And I get this error :
2024-02-28T14:53:53.066Z puppeteer-cluster: Worker Error getting browser page (try: 0), message: Timeout hit: 5000
2024-02-28T14:53:53.066Z puppeteer-cluster: SingleBrowserImpl Repair requested
2024-02-28T14:53:53.066Z puppeteer-cluster: SingleBrowserImpl Starting repair
2024-02-28T14:53:53.166Z puppeteer-cluster: Worker Error getting browser page (try: 0), message: Timeout hit: 5000
2024-02-28T14:53:53.166Z puppeteer-cluster: SingleBrowserImpl Repair requested
For information here is how I used puppeteer which was working :
const browser = await puppeteer.launch({
args: [
'--disable-gpu',
'--disable-dev-shm-usage',
'--disable-setuid-sandbox',
'--no-first-run',
'--no-sandbox',
'--no-zygote',
'--deterministic-fetch',
'--disable-features=IsolateOrigins',
'--disable-site-isolation-trials',
],
headless: true,
executablePath:
process.env.NODE_ENV === "production"
? process.env.PUPPETEER_EXECUTABLE_PATH
: puppeteer.executablePath(),
});
I tried to keep the executablePath with puppeteer-cluster but it doesn't work.
It's my first time coding a nodejs app so maybe I missed something, but I can't find what. The package puppeteer-cluster is well installed since everything works perfectly locally, I just can't find a way to make it work in Render. Maybe I should somehow add puppeteer-cluster to my dockerfile but I don't know how and after testing many things I can't find a solution. Any help would be appreciated !