Python compile-time optimization

879 Views Asked by At

I am struggling to understand how much 'optimizing' is the Python compiler. I did a few tests, for example consider this really simple script:

def to_optimize():
    for i in range(10000):
        a = 1
        b = 2
        c = a + b
        print(c)

When I import this function from another script and disassemble it I get:

>>> dis.dis(optimization.to_optimize)
2           0 LOAD_GLOBAL              0 (range)
            2 LOAD_CONST               1 (10000)
            4 CALL_FUNCTION            1
            6 GET_ITER
      >>    8 FOR_ITER                28 (to 38)
           10 STORE_FAST               0 (i)

3          12 LOAD_CONST               2 (1)
           14 STORE_FAST               1 (a)

4          16 LOAD_CONST               3 (2)
           18 STORE_FAST               2 (b)

5          20 LOAD_FAST                1 (a)
           22 LOAD_FAST                2 (b)
           24 BINARY_ADD
           26 STORE_FAST               3 (c)

6          28 LOAD_GLOBAL              1 (print)
           30 LOAD_FAST                3 (c)
           32 CALL_FUNCTION            1
           34 POP_TOP
           36 JUMP_ABSOLUTE            8
      >>   38 LOAD_CONST               0 (None)
           40 RETURN_VALUE

As you can see almost no optimization at all is applied. I think a piece of code like this would be very easy to optimize more. I know that Python compiles 'in-time', so I do not expect the .pyc files created during a normal execution to be really optimized, but even when I pre-compile the script with python3 -m compileall the result is the same, and no additional optimization flag exists. Is this really the maximum optimization achievable for a Python script? Is there any language limitation?

1

There are 1 best solutions below

0
Jérôme Richard On

The default implementation of Python is CPython which is a slow interpreter. The compilation step is meant to produce a bytecode so to speed up the next interpreter run (so not to recompute the parsing of the code again and again). Python was designed to execute mainly glue codes and scripts, clearly not numerically-intensive codes with pure-Python expressions, so it only optimize very few things. Indeed, optimizations would slow-down startup time and since Python is a very-dynamic language, only very-few basic optimizations are possible during the bytecode generation. For example, one module imported (dynamic operation) can reimplement how objects are added (typically by setting __add__) producing different results. That being said, AFAIK, this is not possible for int object so doing a constant propagation on integers in this case is possible and this is simply a missed optimization. Still, the range and print objects can be re-assigned to a user-specified function so the loop cannot be optimized out by any compile-time method.

There are other implementation of Python meant to optimize the code at runtime typically using a just-in-time compiler (JIT). This is the case of PyPy and Pyston. Such implementation is generally much faster for basic numerical code since the JIT can optimize expressions and translate them to a native code at runtime. That being said, PyPy do not support all modules. For numerical codes like this, embedded JIT like Numba can be even faster (close to a native C/C++ code) though it only support Numpy (as of now).

Note that CPython developers recently decided to significantly optimize the code of the interpreter so such basic optimizations like this might be added soon. Still, the resulting code will be far slower that what PyPy can currently do.