I am running a NestJs application hosted on AWS EC2 (with Elastic Beanstalk). It was running just fine until a couple of days ago, now my application is intermittently crashing with numerous Connection timed out errors in my Nginx error log
1892#1892: *884 upstream timed out (110: Connection timed out) while reading response header from upstream, client: {client_ip}, server: localhost, request: "GET {api_endpoint} HTTP/1.1", upstream: "http://127.0.0.1:3000/{api_endpoint}", host: "{server_host}", referrer: "{server_url}"
On further diagnosis, I've noticed the delay is between two global guards; AuthGuard and PermissionsGuard. While the AuthGuard (executed first) receives the request and responds to it in a matter of milliseconds, the PermissionsGuard receives the request after 50 - 60 seconds (of AuthGuard completing its execution). Hence by the time my controller receives the request > 60 seconds has already passed.
This application has an HTTP listener (for my APIs) and a couple of socket listeners (using Node net) for my IoT Service.
As the issue is intermittent, I have been unable to recreate or diagnose the problem. None of these actions seems to have any impact on the timeouts:
- Restarting the application
- Deleting the ec2 instance and deploying the application in a new ec2 instance (elastic beanstalk immutable deployments)
- Changing the Nginx timeout (client_header_timeout, client_body_timeout, keepalive_timeout) from 60 to 120
I am unable to figure out what is causing this delay between two global guards in NestJs