We're building a backend which needs to make requests to a separate service, but the service can only handle X number of concurrent requests. How can I setup a queue or other architecture to throttle incoming requests.
The idea is that the service should only be processing X requests at a time. Once a request finishes, the next request in the queue should be sent through. The queue also needs to be able to return back a response to the backend.
I've looked at using RabbitMQ for this, essentially having a worker on the other side of the queue to make the request and return a response. The problem is that I need to be able to scale up these workers and limit concurrency globally. RabbitMQ only supports limiting concurrency on a per-worker level (using prefetch).
Any other ideas? We're using NodeJS in the backend so anything that works well with it is ideal.