I am trying to do the following using numpy. Because the size of aa is large, using numpy is slow. I am trying to speed it up by using numba, there is some improvement, but I would like speed it further, because it is a part of another loop. Any advice is much appreciated!
Using numpy:
def get_prob(aa):
allmax = aa.max(axis=1)[:, None]
findmax = aa - allmax
mask = ((findmax[:,1,:]==0)&(findmax[:,2,:]==0))
findmax[:, 1, :][mask] = -1
mask = ((findmax[:, 0, :] == 0) & (findmax[:, 1, :] == 0))
findmax[:, 0, :][mask] = -1
mask = ((findmax[:, 0, :] == 0) & (findmax[:, 1, :] == 0) & (findmax[:, 2, :] == 0))
findmax[:, 0, :][mask] = -1
findmax[:, 1, :][mask] = -1
p = np.where(findmax < 0, 0.0, 1.0).transpose(0,2,1)
return p
Using numba:
@numba.jit(nopython=True)
def get_prob_nb(aa,num_params,num_action):
p=np.zeros_like(aa)
for i in range(num_params):
for j in range(num_action):
a1 = aa[i, 0, j]
a2 = aa[i, 1, j]
a3 = aa[i, 2, j]
if a1>a2 and a1>a3:
p[i, 0, j] = 1.
elif a2>=a1 and a2>a3:
p[i, 1, j] = 1.
elif a3>=a2 and a3>=a1:
p[i, 2, j] = 1.
p = p.transpose(0, 2, 1)
return p
aa=rng.uniform(0.0, 1.0, 9000000)
aa=aa.reshape(1000,3,3000)
start = time.time()
get_prob_nb(aa, 1000, 3000)
print("elapse", time.time()-start)
There's a surprisingly simple way to parallelize your numba call:
Running your test, I saw an time improvement of ~30% versus
get_prob_nb.