Measure CPU clock cycles per operation in python

1.3k Views Asked by At

I'd like to know how'd you measure the amount of clock cycles per instruction say copy int from one place to another?

I know you can time it down to nano seconds but with today's cpu's that resolution is too low to get a correct reading for the oprations that take just a few clock cycles?

It there a way to confirm how many clock cycles per instructions like adding and subing it takes in python? if so how?

1

There are 1 best solutions below

0
not2qubit On

This is a very interesting question that can easily throw you into the rabbit's hole. Basically any CPU cycle measurements depends on your processors and compilers RDTSC implementation.

For python there is a package called hwcounter that can be used as follows:

# pip install hwcounter 

from hwcounter import Timer, count, count_end
from time import sleep

# Method-1
start = count()
# Do something here:
sleep(1)
elapsed = count_end() - start
print(f'Elapsed cycles: {elapsed:,}')

# Method-2
with Timer() as t:
    # Do something here:
    sleep(1)
print(f'Elapsed cycles: {t.cycles:,}')

NOTE: It seem that the hwcounter implementation is currently broken for Windows python builds. A working alternative is to build the pip package using the mingw compiler, instead of MS VS.


Caveats

Using this method, always depend on how your computer is scheduling tasks and threads among its processors. Ideally you'd need to:

  • bind the test code to one unused processor (aka. processor affinity)
  • Run the tests over 1k - 1M times to get a good average.
  • Need a good understanding of not only compilers, but also how python optimize its code internally. Many things are not at all obvious, especially if you come from C/C++/C# background.

Rabbit Hole: