The only thing my Celery task is doing is making an API request and sending the response back to Redis queue. What I'd like to achieve is to utilize as many resources as possible by executing tasks in a coroutine-like fashion. This way every time a coroutine hits requests.post() the context switcher can switch and allocate resources to another coroutine to send one more request and so forth.
As I understand, to achieve this, my worker has to run with a gevent execution pool:
celery worker --app=worker.app --pool=gevent --concurreny=500
But it doesn't solve the problem on its own. I have found that (probably) for it to work as expected we need monkey patching:
@app.task
def task_make_request(payload)
import gevent.monkey
gevent.monkey.patch_all()
requests.post('url', payload)
The questions:
- Is
Geventthe only execution pool that can be used for this goal? - Will
patch_allmakerequests.post()asynchronous so that the context switcher can allocate resources to other coroutines? - What is the preferred way of achieving cooperative multitasking behavior for celery tasks with a single I/O bound operation (API call)?
When you run under the
geventworker, monkey patching happens almost immediately (see: celery.init), and does not need to be repeated. This will patch the threading and related concurrency modules. You can inspect this if you get creative fishing in therequestslibrary dynamically at runtime (an exercise left to the reader).You can also use the eventlet worker, and they have a webscraping example in the Celery repository: here