How to design sequential IO-bound and CPU-bound computing inside iterations

34 Views Asked by At

I share a snippet of code where I want to show my use-case, a loop where I iterate over groups of tasks

   for idx, task in enumerate(tasks):
       try:
          manager.open_connection(idx)
          for jdx, subtask in enumerate(task):
             subtask.io_bound_read_from_sql()
             subtask.io_bound_read_from_fs()
             subtask.cpu_bound_compute()
             subtask.io_bound_write_to_sql()
       finally:
          manager.close_connection(idx)

To date, the instance "manager" wraps SQLAlchemy engine objects in order to manage SQL connections by pools of connections. Both the engine and pool objects are said to be "lazy" as the actual connection is establish when doing the transactions as far as I understood.

Both methods io_bound_read_from_sql() and io_bound_write_to_sql() actually run SQL transactions by the "engines" of SQL Alchemy on a dedicated database identified by idx The idea is to group related tasks referring to the same database, labeled as "subtasks" so that the outer loop iterates over these groups using the same database and the subtasks perform transaction on the same database (i.e. without switching to another database). The idea was to micro-manage these groups so that not to open all the connections/pools of connections together and then keep them alive even if not used and to avoid timeouts/reconnections.

The open_connection(idx) and close_connection(idx) methods actually wrap, among other things, the call to "create_engine" and "engine.dispose" of SQL Alchemy. Also, io_bound_read_from_fs() wraps, among other things, asyncio-based concurrent IO functions.

The sequence of functions called on the subtask inside the inner loop are sequential, but indipendent. For instance, if subtask with jdx counter equal to 1 is doing io_bound_write_to_sql(), I would like subtask with jdx 2 to start concurrently io_bound_read_from_sql().

My question is how to design this proof of concepts. I wish not to use multiprocessing because it is too complicated to me, and I wish to leave multiprocessing for horizontal scaling in the futures. I wish to use only 1 process, and use multithreading or asyncio to concurrently manage the works of the tasks. I'm entusiastic about asyncio, and I saw SQL Alchemy also has asyncio-based adapters to work with asyncio instead of multithreading \ concurrent.futures (?). Do you have any suggestions or experience/insights to share?

0

There are 0 best solutions below