How scipy.ndimage.median_filter works for even sizes

1.3k Views Asked by At

Has someone found/understood how works scipy.ndimage.median_filter for even sizes? Because I tested a lot of theories and tried to read the source code, but I haven't an explanation

(Of course it's better to use odd sizes to avoid shifts, but it's just interesting how exactly median_filter counts it for even numbers...)

Problem definition:

from scipy.ndimage import median_filter
import numpy as np

arr = np.array([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]])
print('Size 3:')
print(median_filter(arr, size=3, cval=0, mode='constant'))
print('Size 2:')
print(median_filter(arr, size=2, cval=0, mode='constant'))
#with cval=0, mode='constant' we set that input array is extended with zeros 
#when window overlaps edges, just for visibility and ease of calculation

Output

Size 3
[[0. 2. 0.]
 [2. 5. 3.]
 [0. 5. 0.]]
Size 2
[[0. 1. 2.]
 [1. 4. 5.]
 [4. 7. 8.]]

As we can see for size 3 all values are like expected, but values for size 2 are not obvious. If suppose that window has shape (2, 2) let's, for example, define and count all possible windows and medians for array element "9". We can cover "9" (or any other value from array) with window of shape (2, 2) in 4 different variants as shown below,

enter image description here

And corresponding median values would be: for red rectangle - 3, yellow - 7, green - 0, blue - 4... But the output value is 8. Even if imagine that afterwards some additional operations are applied to these medians (mean, median etc.), these operation are not obvious.

Interesting fact #1
It seems that for 1d median_filter just takes size-1 (so, odd) number. Example:

arr = np.array([1., 2., 3., 4., 5.])
print('1 - ', median_filter(arr, size=1, cval=0, mode='constant'))
print('2 - ', median_filter(arr, size=2, cval=0, mode='constant'))
print('3 - ', median_filter(arr, size=3, cval=0, mode='constant'))
print('4 - ', median_filter(arr, size=4, cval=0, mode='constant'))
print('5 - ', median_filter(arr, size=5, cval=0, mode='constant'))
print('6 - ', median_filter(arr, size=6, cval=0, mode='constant'))

Output

1 - [1. 2. 3. 4. 5.]
2 - [1. 2. 3. 4. 5.]
3 - [1. 2. 3. 4. 4.]
4 - [1. 2. 3. 4. 4.]
5 - [1. 2. 3. 3. 3.]
6 - [1. 2. 3. 3. 3.]

Interesting fact #2
In general median_filter with even numbers works quite well, blurs images, it's even hard to determine some difference or shifts

2

There are 2 best solutions below

1
Michael Sohnen On

In scipy.ndimage.median_filter, a special case occurs when size = 2. Take a look at the source code at https://github.com/scipy/scipy/blob/v1.8.1/scipy/ndimage/_filters.py#L1351-L1396

The median filter calls the function _rank_filter, which you can find here: https://github.com/scipy/scipy/blob/4cf21e753cf937d1c6c2d2a0e372fbc1dbbeea81/scipy/ndimage/_filters.py#L1244

In the function, you'll find the code:

if rank < 0:
    rank += filter_size
if rank < 0 or rank >= filter_size:
    raise RuntimeError('rank not within filter footprint size')
if rank == 0:
    return minimum_filter(input, None, footprint, output, mode, cval,
                          origins)
elif rank == filter_size - 1:
    return maximum_filter(input, None, footprint, output, mode, cval,
                          origins)

Additionally, earlier in that function rank=size//2.

In this case, the program is performing a minimum filter, and not a true median filter (which would involve averaging).

Both the conditions rank==0 and rank==size-1 are true, but control flow prioritizes rank==0, which corresponds to the minimum filter in the code.

Sizes >=4 may work differently.

0
Mariia Tkachenko On

Thanks to Michael Sohnen answer now I understand how it works!

I'll describe the calculation process step by step for the example in "Problem definition" for size=2 in my question

Due to the source code https://github.com/scipy/scipy/blob/4cf21e753cf937d1c6c2d2a0e372fbc1dbbeea81/scipy/ndimage/_filters.py#L1244-L1307

  1. At first, footprint is calculated from size. In our case size=2, so footprint will be an array with shape (2, 2) filled with True (Lines 1252-1258)

  2. Then, filter_size is calculated (Line 1267)

    filter_size = numpy.where(footprint, 1, 0).sum()
    

    numpy.where(footprint, 1, 0) returns 1 if an array element is True and 0 if False, all footprint elements are True, it means filter_size is equal to 4

  3. rank is calculated (Lines 1268-1279)

    if operation == 'median':   <---- our case
      rank = filter_size // 2 
    

    it means in our case rank is equal to 2

  4. finding of the appropriate condition (Lines 1280-1307)

      if rank < 0:                            #<---- not our case
          rank += filter_size
      if rank < 0 or rank >= filter_size:     #<---- not our case
          raise RuntimeError('rank not within filter footprint size')
      if rank == 0:                           #<---- not our case
          return minimum_filter(input, None, footprint, output, mode, cval,
                        origins)
      elif rank == filter_size - 1:           #<---- not our case
          return maximum_filter(input, None, footprint, output, mode, cval,
                        origins)
      else:                                   #<---- our case!
          ...
          _nd_image.rank_filter(input, rank, footprint, output, mode, cval,
                        origins)
          ...
    

    So, rank_filter will be applied to the array. How works the rank filter?
    It sorts values in the given window and then takes an element with rank-index from this sorted array. For example, rank=2 and window is

    4 3 
    2 1 
    

    Sorted window is [1, 2, 3, 4] and the element on the second index (starting from zero) is 3.

    Special cases of rank filter are minimal filter (in this case rank=0), maximal filter (rank=length_of_an_array) and median filter

  5. And finally, applying of the rank filter to the original array
    enter image description here

    Blue window for element "1", sorted [0 0 0 1], the second element of the sorted array - 0
    Red window for element "2", sorted [0 0 1 2], the second element of the sorted array - 1
    Green window for element "3", sorted [0 0 2 3], the second element of the sorted array - 2
    And so on... So, the first raw of the output array [0, 1, 2]

Now answers match the output!