Why does per-process overhead constantly increase for multiprocessing?

479 Views Asked by At

I was counting for a 6 core CPU with 12 logical CPUs in a for-loop till really high numbers several times.

To speed things up i was using multiprocessing. I was expecting something like:

  • Number of processes <= number of CPUs = time identical
  • number of processes + 1 = number of CPUs = time doubled

What i was finding was a continuous increase in time. I'm confused.

the code was:

#!/usr/bin/python

from multiprocessing import Process, Queue
import random
from timeit import default_timer as timer

def rand_val():
    num = []
    for i in range(200000000):
        num = random.random()
    print('done')

def main():

    for iii in range(15):
        processes = [Process(target=rand_val) for _ in range(iii)]
        start = timer()
        for p in processes:
            p.start()

        for p in processes:
            p.join()

        end = timer()
        print(f'elapsed time: {end - start}')
        print('for ' + str(iii))
        print('')

if __name__ == "__main__":
    main()
    print('done')

result:

  • elapsed time: 14.9477102 for 1
  • elapsed time: 15.4961154 for 2
  • elapsed time: 16.9633134 for 3
  • elapsed time: 18.723183399999996 for 4
  • elapsed time: 21.568377299999995 for 5
  • elapsed time: 24.126758499999994 for 6
  • elapsed time: 29.142095499999996 for 7
  • elapsed time: 33.175509300000016 for 8

. . .

  • elapsed time: 44.629786800000005 for 11
  • elapsed time: 46.22480710000002 for 12
  • elapsed time: 50.44349420000003 for 13
  • elapsed time: 54.61919949999998 for 14
3

There are 3 best solutions below

2
MisterMiyagi On BEST ANSWER

There are two wrong assumptions you make:

  1. Processes are not free. Merely adding processes adds overhead to the program.
  2. Processes do not own CPUs. A CPU interleaves execution of several processes.

The first point is why you see some overhead even though there are less processes than CPUs. Note that your system usually has several background processes running, so the point of "less processes than CPUs" is not clearcut for a single application.

The second point is why you see the execution time increase gradually when there are more processes than CPUs. Any OS running mainline Python does preemptive multitasking of processes; roughly, this means a process does not block a CPU until it is done, but is paused regularly so that other processes can run.
In effect, this means that several processes can run on one CPU at once. Since the CPU can still only do a fixed amount of work per time, all processes take longer to complete.

1
Amiga500 On

I don't understand what you are trying to achieve?

You are taking the same work, and running it X times, where X is the number of SMPs in your loop. You should be taking the work and dividing it by X, then sending a chunk to each SMP unit.

Anyway, with regards what you are observing - you are seeing the time it takes to spawn and close the separate processes. Python isn't quick at starting new processes.

2
Lie Ryan On

Your test is faulty.

Imagine this, it takes 1 day for one farmer to work a 10km^2 farmland using a single tractor. If there are two farmers working 20km^2 farms, why are you expecting two farmers working twice the amount of farmlands using two tractors to take less time?

You have 6 CPU cores, your village has 6 tractors, but nobody has money to buy private tractors. As the number of workers (processes) on the village increased, the number of tractors remain the same, so everyone has to share the limited number of tractors.

In an ideal world, two farmers working twice the amount of work using two tractors would take exactly the same amount of time as one farmer working one portion of work, but in real computers the machine has other work to do even if it seems idle. There are task switching, the OS kernel has to run and monitor hardware devices, memory caches needs to be flushed and invalidated between CPU cores, your browser needs to run, the village elder is running a meeting to discuss who should get the tractors and when, etc.

As the number of workers increases beyond the number of tractors, farmers don't just hog the tractors for themselves. Instead they made an arrangement that they'd pass the tractors around every three hours or so. This means that the seventh farmer don't have to wait for two days to get their share of tractor time. However, there's a cost to transferring tractors between farmlands, just as there are costs for CPU to switch between processes; task switch too frequently and the CPU is not actually doing work, and switch to infrequently and you get resource starvation as some jobs takes too long to start being worked on.

A more sensible test would be to keep the size of farmland constant and just increase the number of farmers. In your code, that would correspond to this change:

def rand_val(num_workers):
    num = []
    for i in range(200000000 / num_workers):
        num = random.random()
    print('done')

def main():
    for iii in range(15):
        processes = [Process(target=lambda: rand_val(iii)) for _ in range(iii)]
        ...