I've migrated an event triggered Google Cloud Function to gen 2 and I'm facing a lot of problems. The function takes care to print a PDF using Puppeteer. I've solved all issues related to Puppeteer install and managed to add await before all async functions in order to wait for termination of each secondary task before moving to the next one.
Even so I reach a point where the call to a function to render an Handlebars template takes more than 1 minute vs 0.5s in 1gen version. It's a sync operation. Subsequent call to Puppeteer for PDF printing miserably fails.
Now I've found a solution by turning on "CPU always allocated" flag in Cloud Run but I'm afraid that it might be too expensive and can't see why it can't work as 1gen functions. I've configured each instance to receive no more than 1 request. What should I expect when more requests are received? Would each one cold-start a new instance? How long would each instance last? Are instances automatically terminated if no traffic is received? If so what does "CPU always allocated" mean?
Yes, that's how managed serverless backends work. See the documentation for details.
As long as the cloud provider wants them to last. You don't get to configure that. You are allowing the cloud provider to make a good decision.
Yes, that's how managed serverless backends work. The documentation says "An instance will never stay idle for more than 15 minutes after processing a request unless it is kept active using minimum instances."
Start with the documentation:
The documentation is suggesting that the CPU for a server instance can be shut down when there are no requests currently being processed on that instance, which is intended to save time and resources.