We are using a Django application (https://github.com/saleor/saleor) to handle our e-commerce use-cases. We are using ASGI with Uvicorn in production with 4 workers. Infra setup -
- 4 instances of 4 core 16 GB machines for hosting the Django application (Saleor).
- The app is deployed using docker on all the instances.
- 2 instances of 4 core 16 GB for Celery.
- Hosted PostgresSQL solution with one primary and one replica.
Saleor uses Django and Graphene to implement GraphQL APIs. One of the API is createCheckout which takes around 150ms to 250ms depending on the payload entities. While running load test with 1 user, the API consistently gives similar latencies. When number of concurrent users increase to 10, the latencies increase to 4 times (1sec - 1.3 secs). With 20 users, it reaches to more than 10 seconds.
Average CPU usage doesn't exceed more than 60%. While tracing the latencies, we found out that the core APIs are not taking more than 150-250ms even with 20 users making concurrent requests. This means that all the latencies are being added at ASGI + Uvicorn layer.
Not sure what are we missing out here. From the deployment perspective, we have followed standard Django + ASGI + Uvicorn setup for production. Any help or suggestions with this regard would be appreciated.
We had similar issues when the saleor setup did not include the celery runner. Can you make sure, that celery is connected via redis and that it processes requests for every checkout as expected? If not, saleor could not run any async tasks and tries to run them synchronous, adding a lot of latency ..