Replace sub-sub-arrays in a (m, n, 2)-shaped 3D numpy array using a dictionary as map

56 Views Asked by At

I have an input_array of shape (m, n, 2) and a dictionary mapping (Dict[Tuple[int, int], float]). I like to create a new (m, n) output_array by replacing the inner most arrays in the input_array based on the mapping dictionary.

Example:

input_array = np.array([[[1, 2], [3, 7]], [[1, 2], [4, 5]]])
mapping = {(1, 2): 0.7, (3, 7): 0.8, (4, 5): 0.9, (2, 4): 0.3}
---
output_array = np.array([[0.7, 0.8], [0.7, 0.9]])

Attempt:

import time
import numpy as np

def map_values(input_array, mapping):
    output_array = np.empty(input_array.shape[:2], dtype=float)

    tic = time.time()
    for key in np.unique(input_array.reshape(-1, 2), axis=0):
        value = mapping[tuple(key)]
        
        mask = np.all(input_array == key, axis=-1)
        indices = np.where(mask)
        output_array[indices[0], indices[1]] = value
    toc = time.time()
    print(f'mapping loop took {toc-tic:.4f} seconds')

    return output_array

Is there a way to make the looping and replacement of arrays faster?

1

There are 1 best solutions below

3
mozway On

I would use a simple loop:

out = (np.array([mapping[l] for l in
                 map(tuple, input_array.reshape(-1, 2))])
         .reshape(input_array.shape[:-1])
      )

Or, if you expect many duplicated values, you could reduce the mapping step to the unique values, although the bottleneck being np.unique this might not be much faster if n is large:

tmp, idx = np.unique(input_array.reshape(-1, 2),
                     return_inverse=True, axis=0)

out = (np.array([mapping[tuple(l)] for l in tmp])[idx]
         .reshape(input_array.shape[:-1])
      )

Output:

array([[0.7, 0.8],
       [0.7, 0.9]])
timings

On (2, 2, 2):

# original
86 µs ± 11.4 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

# this answer: simple loop
7.08 µs ± 811 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# this answer: only mapping unique values
60.7 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

On (200, 200, 2):

# original
33.7 ms ± 4.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# this answer: simple loop
45.7 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# this answer: only mapping unique values
30 ms ± 854 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

For large m (or n) (2000, 2, 2):

# original
2.78 ms ± 366 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# this answer: simple loop
4.54 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# this answer: only mapping unique values
2.3 ms ± 6.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

For large m and n (e.g. (2000, 2000, 2)):

# original
4.5 s ± 67.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# this answer: simple loop
4.29 s ± 35.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# this answer: simple loop
4.31 s ± 108 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

As noted by @MatBailie, the simple loop can be further improved using np.fromiter:

N = input_array.shape[-1]
out = (np.fromiter((mapping[l] for l in
                   map(tuple, input_array.reshape(-1, N))),
                   float, count=input_array.size//N)
         .reshape(input_array.shape[:-1])
      )