I've recently had to implement a simple bruteforce software in python, and I was getting terrible execution times (even for a O(n^2) time complexity), topping the 10 minutes of runtime for a total of 3700 * 11125 * 2 = 82325000 access operations on numpy arrays (intel i5 4300U).
I'm talking about access operations because I initialized all the arrays beforehand (out of the bruteforce loops) and then I just recycle them by just overwriting without reallocating anything.
I get that bruteforce algorithms are supposed to be very slow, but 82 million acesses on contiguous-memory float arrays should not take 10 minutes even on the oldest of cpus...
After some research I found this post: Why is numpy.array so slow?, which lead me to think that maybe numpy arrays with length of 3700 are too small to overcome some sort of overhead (which I do not know since, as I said, I recycle and not reallocate my arrays), therefore I tried substituting arrays with lists and... voilà, run times were down to the minute (55 seconds) for a total of 10x decrease!
But now I'm kinda baffled on why on earth a list can be faster than an array, but more importantly, where is the threshold? The cited post said that numpy arrays are very efficient on large quantities of data, but where does "a lot of data" becomes "enough data" to actually get advantages? Is there a safe way to determine whether to use arrays or lists? Should I get worried about writing my functions two times, one for each case?
For anyone wondering, the original script was something like (list version, mockup data):
X = [0] * 3700 #list of 3700 elements
y = [0] * 3700
combinations = [(0, 0)] * 11125
grg_results = [[0, 0, 0]] * len(combinations)
rgr_results = [[0, 0, 0]] * len(combinations)
grg_temp = [100] * (3700 + 1)
rgr_temp = [100] * (3700 + 1)
for comb in range(len(combinations)):
pivot_a = combinations[comb][0]
pivot_b = combinations[comb][1]
for i in range(len(X)):
_x = X[i][0]
_y = y[i][0]
if _x < pivot_a:
grg_temp[i + 1] = _y * grg_temp[i]
rgr_temp[i + 1] = (2 - _y) * rgr_temp[i]
elif _x >= pivot_a and _x <= pivot_b:
grg_temp[i + 1] = (2 - _y) * grg_temp[i]
rgr_temp[i + 1] = _y * rgr_temp[i]
else:
grg_temp[i + 1] = _y * grg_temp[i]
rgr_temp[i + 1] = (2 - _y) * rgr_temp[i]
grg_results[comb][0] = pivot_a
grg_results[comb][1] = pivot_b
rgr_results[comb][0] = pivot_a
rgr_results[comb][1] = pivot_b
grg_results[comb][2] = metrics[0](grg_temp)
rgr_results[comb][2] = metrics[0](rgr_temp)
Let's do some simple list and array comparisons.
Make a list of 0s (as you do):
Make an array from that list - a lot more time:
Make the array of 0s in the correct numpy way:
Now sum all the elements of the list:
Sum of the array:
In this case the list is a bit faster; but make a much larger list/array:
The list sum scales O(n); the array scales much better (it has a higher "setup", but lower O(n) dependency). The iteration is done in c code.
Add 1 to all elements of the list:
A similar list comprehension on the array is slower:
But doing a whole-array operation (where it iteration and addition is done in
ccode):For smaller size the [115] comprehension will be faster; there's no clear threshold.