Specifications:
- Intel i5-1035G1
- Ubuntu 22.04.3 LTS (Dual boot)
Hi everyone!
I tried to make a stream benchmark on my computer for a couse, but the results I get for the bandwidth seem reeeaaaally too high.
First I did it in normal python, in a Jupyter notebook. The results were OK (~1GB/s for lists, and decreasing with length; 0.7GB/s for array.array, quite constant).
Anyway, here is not the problem.
Then I tried with Cython. And now, it says I have a bandwidth of 14GB/s (using a data size of 8 bytes). But I know that even if in theory it is possible, it's not right, because my code is not made for multi-threading (to be sure I ran htop during the benchmark and there was indeed only one core at 100% while the others were below 5%).
Please help my stupid ass
Cython file
#cython: boundscheck=False
import time
import numpy as np
cimport numpy as cnp
def stream(unsigned int STREAM_ARRAY_SIZE):
cdef cnp.float64_t[:] a, b, c
cdef double scalar
a = np.ones(STREAM_ARRAY_SIZE, dtype=np.float64)
b = np.ones(STREAM_ARRAY_SIZE, dtype=np.float64) * 2.0
c = np.zeros(STREAM_ARRAY_SIZE, dtype=np.float64)
scalar = 2.0
times = [0, 0, 0, 0]
timer = time.time_ns
def copy():
cdef unsigned int i
times[0] = timer()
for i in range(STREAM_ARRAY_SIZE):
c[i] = a[i]
times[0] = timer() - times[0]
def scale():
cdef unsigned int i
times[1] = timer()
for i in range(STREAM_ARRAY_SIZE):
b[i] = scalar*c[i]
times[1] = timer() - times[1]
def add():
cdef unsigned int i
times[2] = timer()
for i in range(STREAM_ARRAY_SIZE):
c[i] = a[i]+b[i]
times[2] = timer() - times[2]
def triad():
cdef unsigned int i
times[3] = timer()
for i in range(STREAM_ARRAY_SIZE):
a[i] = b[i]+scalar*c[i]
times[3] = timer() - times[3]
copy()
scale()
add()
triad()
# Times are in ns, so without conversion, the calculation would be in GB/s
return times
Python file:
import cythonstream
import matplotlib.pyplot as plt
import statistics
def bandwidth(STREAM_ARRAY_SIZE):
times = cythonstream.stream(STREAM_ARRAY_SIZE)
copy, scale, add, triad = times
copy = (2 * 8 * STREAM_ARRAY_SIZE) / copy
scale = (2 * 8 * STREAM_ARRAY_SIZE) / scale
add = (3 * 8 * STREAM_ARRAY_SIZE) / add
triad = (3 * 8 * STREAM_ARRAY_SIZE) / triad
return copy, scale, add, triad
def plot(nb_experiment=5):
"""
Plot the STREAM benchmark and makes an average on nb_experiment
"""
WANTED_VALUES = [i for i in range(1000 * 1000, 50 * 1000 * 1000, 1000 * 1000)]
copy_bandwidth, scale_bandwidth, add_bandwidth, triad_bandwidth = (
[[] for _ in range(nb_experiment)],
[[] for _ in range(nb_experiment)],
[[] for _ in range(nb_experiment)],
[[] for _ in range(nb_experiment)],
)
for i in range(nb_experiment):
for value in WANTED_VALUES:
copy, scale, add, triad = bandwidth(value)
copy_bandwidth[i].append(copy)
scale_bandwidth[i].append(scale)
add_bandwidth[i].append(add)
triad_bandwidth[i].append(triad)
# Averages on nb_experiment and plot on WANTED_VALUES
# ...
if __name__ == "__main__":
plot()
I used these formulae from my course for the total amount of data that is moved for the different kernels:
copy -> 2 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE,
add -> 2 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE,
scale -> 3 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE,
triad -> 3 * sizeof(STREAM_ARRAY_TYPE) * STREAM_ARRAY_SIZE
PS: Finally the results I got with the Jupyter (for just python) are wrong, because I used python sys.getsizeof, which gives always 24 bytes for data length, instead of 8 when using the arrays of doubles for example. They should be even lower. Which increases my doubt for such a difference with Cython