Multithreading vs linear execution of python code showing absurd results

219 Views Asked by At

To my best knowledge if I have a CPU intensive work multithreading should work similar to linear execution of the same code. So I tested it using this simple code.

import datetime
import threading
import time


def test(args):
    i,wait = args
    for _ in range(i):
        # a = 0
        # while a <= 1000000:
        #     a+=1
        t = datetime.datetime.now() 
        while datetime.datetime.now() <= t + datetime.timedelta(seconds=wait):
            pass

if __name__ =="__main__":

    iteration = 50000
    wait = 0.001
    print(f'Running {iteration} iteration, wait {wait}')

    t1 = threading.Thread(target=test, args=((iteration,wait),))
    t2 = threading.Thread(target=test, args=((iteration,wait),))

    start = time.time()
    t1.start()
    t2.start()

    t1.join()
    t2.join()

    multi = time.time()-start

    start = time.time()
    test((iteration*2,wait))
    print('multi and linear time:',multi,time.time()-start)

And result changes depending on parameters iteration and wait. I thought they should be similar regardless of these parameters.

enter image description here

And now if I comment and uncomment the code like this

a = 0
while a <= 1000000:
    a+=1
# t = datetime.datetime.now() 
# while datetime.datetime.now() <= t + datetime.timedelta(seconds=wait):
#          pass

Results are much similar.

enter image description here

Can someone please explain these results.

1

There are 1 best solutions below

0
jsbueno On

So, the problem you have is that for a small wait time, when running threaded code, it takes almost 50% longer than the linear code, and when the waiting time is larger, running the threaded code has the same result as the linear code.

This is certain due to implementation details related to the way the Python runtime will lock/unlock resources when calling the OS calls that answer the time proper: it looks like when you are working with 0.001 delay.

I'd say that the time it takes for Python to set-up and get out of the while loop is more significative than repeatng many calls to time.time() once the setup is complete: so, setting up the while loop takes a lot of time compared with the 0.001 wait in the loop, but that setup time is the same (close to 0.0005 seconds if we think on the 50% extra time) for the 1 sec. delay.

Anyway - it is due to minor details, and works to illustrate that multi-threading in Python indeed have a lot of peculiarities, and one will be best served keeping in mind that for CPU-intensive tasks, Python-multi-threading should just be avoided.

But for Python 3.12 (now in alpha), PEP 684 allowing independent GIL in separate sub-interpreters may make possible to overturn some of these limitations (by setting a thread in each independent interpreter). Wait for it.

(shameless self-promotion): the API for use of the sub-interpreters will be a bit rough at first, but I am working in a 3rdy party package to make it as easy as working with threads: "extrainterpreters".


running the code

Back here - I could not resist trying this code, and include the "extrainterpreter"s measurements. It turns out it only gets wilder.

So, first: I am not in a mood to wait 600 seconds for a test run, so I cut back on the number of interations in your script. Then I refactored it a bit, so there were no need of edits to run the tests with different parameters

Second, I just added the modality using sub-interpreters to the mix.

As for the results: I) I got the opposite of your results, with multi-threading run being actually faster than linear in the same scenario it was 50% slower in yours: that should be either due to (1) changes in the run conditions due to the smaller number of interations and ratio of the fixed "0.001" delay vs raw CPU speed in my machine vs yours (mine is a 2018 era dual core i7, a bit dated) or (2) due to improvements in the Python runtime in v. 3.12 alpha vs your version (I think you didn't say which Python version you got there, but these timings should have changed a lot in 3.11, and some more in 3.12)

II) The improvements with using sub-interpreters varied a lot, with the modality using 10 iterations and just doing increments using Python code taking exactly twice as long on the sub-interpreter mode. (possibly due to some shared resource across interpreters, in the setup of the function call itself - it may even be a problem in my code). The same modality with 1000 iterations (down from your 50000) showed, on the other hand, a significative gain with the sub-interpreters modality.

Here are the results, followed by the script:

$ python bench1.py 
Running 1000 iteration, wait 0.001, increment, threaded
Running 1000 iteration, wait 0.001, increment, multi-interpreter
Running 1000 iteration, wait 0.001, increment, linear,
Mode: increment
        multi-threaded time: 6.3514
        Multi-interpreter time: 4.0171
        linear time: 7.3083


Running 1000 iteration, wait 0.001, time_polling, threaded
Running 1000 iteration, wait 0.001, time_polling, multi-interpreter
Running 1000 iteration, wait 0.001, time_polling, linear,
Mode: time_polling
        multi-threaded time: 1.6991
        Multi-interpreter time: 1.1579
        linear time: 2.0043


Running 10 iteration, wait 1, increment, threaded
Running 10 iteration, wait 1, increment, multi-interpreter
Running 10 iteration, wait 1, increment, linear,
Mode: increment
        multi-threaded time: 0.0672
        Multi-interpreter time: 0.1302
        linear time: 0.0630


Running 10 iteration, wait 1, time_polling, threaded
Running 10 iteration, wait 1, time_polling, multi-interpreter
Running 10 iteration, wait 1, time_polling, linear,
Mode: time_polling
        multi-threaded time: 10.0184
        Multi-interpreter time: 10.1104
        linear time: 20.0001

The script. To use it while sub-interpreters are not generally available, just comment the necessary lines.

It will require a special branch from Eric Snow where the code for PEPs 684 and 554 is in developement. It should work in Python 3.12 beta, when its out (in 3 more weeks) - but , there is a chance they will not aprove PEP 554, which provides the Python-side code to run sub-interpreters. Nonetheless, once you get a supported Python runtime, you can install extrainterpreters from https://github.com/jsbueno/extrainterpreters (with pip install git+https://github.com/jsbueno/extrainterpreters.git)

# initial code by Gaurav Aggarwal on stackoverflow question
# https://stackoverflow.com/questions/76171191/multithreading-vs-linear-execution-of-python-code-showing-absurd-results/76187049#76187049

import datetime
import threading
import time

import extrainterpreters as ei


def test(iterations, wait, mode="increment"):
    for _ in range(iterations):
        if mode == "increment":
            a = 0
            while a <= 100000:
                a+=1
        else:
            t = datetime.datetime.now()
            while datetime.datetime.now() <= t + datetime.timedelta(seconds=wait):
                pass

if __name__ =="__main__":

    for iteration, wait in ((1000, .001), (10, 1)):
        for mode in ("increment", "time_polling"):
            print(f"Running {iteration} iteration, wait {wait}, {mode}, threaded")
            threads = [threading.Thread(target = test, args=(iteration, wait, mode)) for _ in (0,1)]
            start = time.time()
            [t.start() for t in threads]
            [t.join() for t in threads]
            multi_thread = time.time()-start

            print(f"Running {iteration} iteration, wait {wait}, {mode}, multi-interpreter")
            interps = [ei.Interpreter(target = test, args=(iteration, wait, mode)) for _ in (0,1)]
            start = time.time()
            [i.start() for i  in interps]
            [i.join() for i in interps]
            multi_interpreter = time.time()-start

            print(f"Running {iteration} iteration, wait {wait}, {mode}, linear,")
            start = time.time()
            test(iteration*2, wait, mode)
            linear = time.time() - start

            print(f"Mode: {mode}\n\tmulti-threaded time: {multi_thread:.4f}\n\tMulti-interpreter time: {multi_interpreter:.4f}\n\tlinear time: {linear:.4f}\n\n")