I'm parallelizing the processing of 1000 columns of a pandas dataframe using joblib.parallel and concurrent.futures.
In the first case I'm just setting n_jobs=-1 while, with concurrent, I'm splitting the columns into 12 batches (my machine has 12 cores) and making each core process a batch of columns in a for loop.
I have two questions:
Using joblib, why I see 8 python processes running and not 1000?
Why joblib is faster than using concurrent with batches? Looking around, I read that batch processing is often better than spawning a process for each column.