python multiprocessing to execute a for loop in parallel

48 Views Asked by At

I have a specific requirement where I am using Big query for loop do some ETL. The for loop iterates over 35k times but it takes almost 1 minute for each iteration to complete. I want to implement a python multiprocessing code where I will call my BQ stored procedure in python for loop and let that run parallel?

how should I achieve this? Is this really possible to do that or there are some other ways i can run the for loop parallelly, I cannot escape the for loop.

1

There are 1 best solutions below

1
Elnur Maharramov On
import multiprocessing
from google.cloud import bigquery

# Define your BigQuery stored procedure call function
def call_stored_procedure(iterations):
    client = bigquery.Client()
    # Your code

def main():
    num_processes = multiprocessing.cpu_count()  # Number of CPU cores
    iterations_per_process = 1000  # Adjust this based on your workload

    # Create a pool of processes
    pool = multiprocessing.Pool(processes=num_processes)

    # Submit tasks to the pool
    for _ in range(0, 35000, iterations_per_process):
        pool.apply_async(call_stored_procedure, (iterations_per_process,))

    # Close the pool and wait for all processes to complete
    pool.close()
    pool.join()

if __name__ == "__main__":
    main()