How to speed up Numpy

138 Views Asked by At

I am trying to do the following using numpy. Because the size of aa is large, using numpy is slow. I am trying to speed it up by using numba, there is some improvement, but I would like speed it further, because it is a part of another loop. Any advice is much appreciated!

Using numpy:

def get_prob(aa):
    allmax = aa.max(axis=1)[:, None]
    findmax = aa - allmax
    mask = ((findmax[:,1,:]==0)&(findmax[:,2,:]==0))
    findmax[:, 1, :][mask] = -1

    mask = ((findmax[:, 0, :] == 0) & (findmax[:, 1, :] == 0))
    findmax[:, 0, :][mask] = -1

    mask = ((findmax[:, 0, :] == 0) & (findmax[:, 1, :] == 0) & (findmax[:, 2, :] == 0))
    findmax[:, 0, :][mask] = -1
    findmax[:, 1, :][mask] = -1

    p = np.where(findmax < 0, 0.0, 1.0).transpose(0,2,1)
    return p

Using numba:

@numba.jit(nopython=True)
def get_prob_nb(aa,num_params,num_action):
    p=np.zeros_like(aa)

    for i in range(num_params):
        for j in range(num_action):
            a1 = aa[i, 0, j]
            a2 = aa[i, 1, j]
            a3 = aa[i, 2, j]
            if a1>a2 and a1>a3:
                p[i, 0, j] = 1.
            elif a2>=a1 and a2>a3:
                p[i, 1, j] = 1.
            elif a3>=a2 and a3>=a1:
                p[i, 2, j] = 1.

    p = p.transpose(0, 2, 1)
    return p

aa=rng.uniform(0.0, 1.0, 9000000)
aa=aa.reshape(1000,3,3000)
start = time.time()
get_prob_nb(aa, 1000, 3000)
print("elapse", time.time()-start)
2

There are 2 best solutions below

2
Saul Aryeh Kohn On

There's a surprisingly simple way to parallelize your numba call:

@numba.jit(nopython=True, parallel=True)
def get_prob_nb_parallel(aa, num_params, num_action):
    p = np.zeros_like(aa)

    for i in numba.prange(num_params):
        for j in range(num_action):
            a1 = aa[i, 0, j]
            a2 = aa[i, 1, j]
            a3 = aa[i, 2, j]
            if a1 > a2 and a1 > a3:
                p[i, 0, j] = 1.
            elif a2 >= a1 and a2 > a3:
                p[i, 1, j] = 1.
            elif a3 >= a2 and a3 >= a1:
                p[i, 2, j] = 1.

    p = p.transpose(0, 2, 1)
    return p

Running your test, I saw an time improvement of ~30% versus get_prob_nb.

0
HOBE On
def get_prob_keepdims(aa):
    max_values = aa.max(axis=1, keepdims=True)
    p = np.equal(aa, max_values).astype(float)
    return p.transpose(0, 2, 1)

The function get_prob_keepdims utilizes the keepdims=True parameter in its computation, which maintains the original array's dimension after performing the max operation across a specific axis. Based on my understanding of the provided code, I believe this function should operate the same as the original get_prob function.

In my 100 iterations test, using keepdims yielded slightly faster results than using Numba without parallel on a Mac M1.