How to efficiently apply a function to a NumPy array view in-place?

475 Views Asked by At

I have a view of a NumPy array. I want to apply a function to each of its elements and save the result into said view (essentially, I want to do an in-place map). I do not want to use for loops, because they do not benefit from any NumPy optimizations/parallelization. I also cannot do something like arr = map(fn, arr), since it creates a new object.

2

There are 2 best solutions below

0
hpaulj On BEST ANSWER

You aren't the first to ask about applying a scalar function to all elements of an array. That comes up often. Just search for the use of "vectorize" in SO questions.

By stressing that this is a view, I assume you want to make sure that the changes apply to the corresponding elements of the base array. Others stress in-place because they think this will save memory or be faster.

If array is 2d then the old-fashioned nested loop

 for i in range ...:
      for j in range ...:
          arr[i,j] = func(arr[i,j])

takes care of the in-place requirement.

If you can rework func to work with a 1d array, you can do

for i in range...:
    arr[i,:] = func(arr[i,:])

func will produce a new array, but those values can be copied back into arr[i,:] (and arr.base) without problem.

The ideal, speed wise, is a function that can work with the whole nd array, with operators and numpy functions. That's fastest, but will always produce a tempoary buffer that you have to copy. That has to be copied back to the view (though the out parameter of ufunc can help).

 arr[:] = func(arr)

Even arr[indx] += 1 uses a temporary buffer, when can be problem if the indx has duplicate indicies. For that, ufunc may have an at method to perform unbuffered iteration.

https://numpy.org/doc/stable/reference/generated/numpy.ufunc.at.html

There aren't many numpy operations that work in-place. Most produce a new array. It's easier to create a building-block language that way.

There are some tools that "streamline" iterating on an array, but they don't offer any real performance enhancement. But questions come up often about them - np.vectorize, np.frompyfunc, np.nditer, and (my least favorite) np.apply_along_axis. In-place is trickier with the functions.

But for real performance you have to use a tool that compiles your function, such as numba or cython. There are lots of SO about those.

Python map just sets up an iteration, which is 'run' with a for or list(). I prefer the list-comprehension notation. None of that is special to numpy.

Other SO deal with multithreading and processing.

1
Ali Khosravi On

It seems you function is designed to execute individually on individual elements. The best way to go efficient is to edit and vectorize the function. Alternatively, a quick and dirty solution is to ask NumPy to do it for you, see numpy.vectorize.

But be aware, depending on the set of operation executed inside the function, you may go more efficient if vectorize your function manually.