Why is my filter method not removing some elements that should be removed?

197 Views Asked by At

I am trying to create a method called filer_out! that takes in an array and a proc, and returns the same array but with every element that returns true when it is run through the proc, with the caveat being we can't use Array#reject!

I wrote this:

def filter_out!(array, &prc)
    array.each { |el| array.delete(el) if prc.call(el)}
end

arr_2 = [1, 7, 3, 5 ]
filter_out!(arr_2) { |x| x.odd? }
p arr_2

but when I run the code, what prints out is:

[7, 5]

even though the answer should be:

[]

Upon review of the solution I see Array#uniq was used first:

def filter_out!(array, &prc)
    array.uniq.each { |el| array.delete(el) if prc.call(el) }
end

arr_2 = [1, 7, 3, 5 ]
filter_out!(arr_2) { |x| x.odd? }
p arr_2

and the correct output was displayed:

[]

So I guess what my question is, why do you have to use Array#uniq in order to get the correct solution? thanks for your help!

3

There are 3 best solutions below

2
Ninh Le On BEST ANSWER

The problem here is the method delete modify the original array. Here the deal if you put some information out:

def filter_out!(array, &prc)
  array.each.with_index do |el, i|
    p "Current index #{i}"
    p "Current array #{array}"
    p "Current element #{el}"
    array.delete(el) if prc.call(el)
  end
end

arr_2 = [1, 7, 3, 5 ]
filter_out!(arr_2) { |x| x.odd? }
# Output:
#"Current index 0"
# "Current array [1, 7, 3, 5]"
# "Current element 1"
# "Current index 1"
# "Current array [7, 3, 5]"
# "Current element 3"

Explain:

  • The first time, the element is 1, after it's deleted the array is [7, 3, 5]
  • The second time, index in the iteration is 1, it gets the current element with this index in the current array, in this case, is 3 not 7 and delete it, after deleting the array is [3, 5]
  • After two times it stops iteration because the current index is out of range of the current array

By using uniq you get the right result because array.uniq it creates a copy of the original array when the original array is modified, it still iteration as expect.

0
Stefan On

Traversing an array uses an internal "cursor" of some kind that points to the current element, e.g.:

[ 1, 7, 3, 5 ]  # 1st step
# ^

[ 1, 7, 3, 5 ]  # 2nd step (increment cursor)
#    ^

# etc.

If you remove the current element during iteration, the cursor immediately points to the next element, thus skiping an element when it gets incremented on the next turn:

[ 1, 7, 3, 5 ]  # 1st step
# ^

[ 7, 3, 5 ]     # 1nd step (remove element under cursor)
# ^

[ 7, 3, 5 ]     # 2nd step (increment cursor)
#    ^

[ 7, 5 ]        # 2nd step (remove element under cursor)
#    ^

# done

A typical work-around is to iterate the array in reverse order, i.e.:

[ 1, 7, 3, 5 ]  # 1st step
#          ^

[ 1, 7, 3 ]     # 1nd step (remove element under cursor)
#          ^

[ 1, 7, 3 ]     # 2nd step (decrement cursor)
#       ^

[ 1, 7 ]        # 2nd step (remove element under cursor)
#       ^

# etc.

Note that the cursor might be out of bounds in this algorithm, so you have to be careful in that regard.

The above as Ruby code:

def filter_out!(array)
  (array.size - 1).downto(0) do |i|
    array.delete_at(i) if yield array[i]
  end
end
1
Jörg W Mittag On

As a general rule, not just for this question, and not just for Ruby, it is never a good idea to mutate a collection while you are iterating over it. Just don't do that. Ever.

Actually, personally, I just think you should avoid any mutation at all, as far as feasible and sensible, but that might be a tad extreme.

Your code is also committing another cardinal sin: never mutate an argument. Ever. You should only use arguments to compute the result, you should never mutate them. That is hugely surprising to anyone who calls your method, and in programming, surprises are dangerous. They lead to bugs and security holes.

Lastly, you are using the bang ! naming convention wrong. The bang is used to mark the "more surprising" of a pair of methods. You should only have a filter_out! method if you also have a filter_out method. And speaking of style, indentation in Ruby is 2 spaces, not 4, as per standard community coding style.

Okay, having said that, let's look at what happens in your code.

Here is the relevant part of the implementation of Array#each from the Rubinius Ruby implementation, defined in core/array.rb#L62-L77:

def each
  i = @start
  total = i + @total
  tuple = @tuple

  while i < total
    yield tuple.at(i)
    i += 1
  end
end

As you can see, it is just a simple while loop incrementing the index every time. Other Ruby implementations are similar, e.g. here is a simplified version of JRuby's implementation in core/src/main/java/org/jruby/RubyArray.java#L1805-L1818:

public IRubyObject each(ThreadContext context, Block block) {
    for (int i = 0; i < size(); i++) {
        block.yield(context, eltOk(i));
    }
}

Again, just a simple index loop.

In your case, we are starting with an array, whose backing store looks like this:

1 7 3 5

On the first iteration of the each, the iteration counter is at index 0:

1 7 3 5
↑

1 is odd, so we delete it, and the situation now looks like this:

7 3 5
↑

The last thing we do in the loop iteration is to increase the iteration counter by one:

7 3 5
  ↑

Okay, in the next iteration, we check again: 3 is odd, so we delete it:

7 5
  ↑

We increase the iteration counter:

7 5
    ↑

And now we have the exit condition of our loop: i is no longer less than the size of the array: i is 2 and the size is also 2.

Note that in the JRuby implementation, the size is checked with a call to the size() method, which means it is re-calculated every time. In Rubinius, however, the size is cached, and only calculated once before the loop starts. Therefore, Rubinius will actually try to keep going here and access the third element of the backing tuple which doesn't exist, which leads to this NoMethodError exception:

NoMethodError: undefined method `odd?' on nil:NilClass.

Accessing a non-existing element of a Rubinius::Tuple returns nil, and each is then passing that nil to the block, which tries to call Integer#odd?.

It is important to note that Rubinius is doing nothing wrong here. The fact that it raises an exception is not a bug in Rubinius. The bug is in your code: mutating a collection while iterating over it is simply not allowed.

All of this now explains why the solution which calls Array#uniq first works: Array#uniq returns a new array, so the array you are iterating over (the one returned returned from Array#uniq) and the array you are mutating (the one referenced by the array parameter binding) are two different arrays. You would have gotten the same result with, e.g. Object#clone, Object#dup, Enumerable#map, or many others. Just as an example:

def filter_out(array)
  array.map(&:itself).each { |el| array.delete(el) if yield el }
end

arr_2 = [1, 7, 3, 5]
filter_out(arr_2, &:odd?)
p arr_2

However, a more idiomatic Ruby solution would be something like this:

def filter_out(array, &blk)
  array.delete_if(&blk)
end

arr_2 = [1, 7, 3, 5]
arr_3 = filter_out(arr_2, &:odd?)
p arr_3

This fixes all the problems with your code:

  • It doesn't mutate the array.
  • It doesn't mutate any argument.
  • In fact, there is no mutation going on at all.
  • It uses existing methods from the Ruby core library instead of re-inventing the wheel.
  • No bang !
  • Two spaces indentation.

The only real question is whether the method is needed at all, or if it would make more sense to just write

arr_2 = [1, 7, 3, 5]
arr_3 = arr_2.delete_if(&:odd?)
p arr_3