The function below throws an error because of a recursive default argument reference. This is intended behavior because default arguments are evaluated inside the scope of the function.
i = 1
f1 = function(i=i) i
f1() # Error: promise already under evaluation: recursive default argument reference
But if i is inside single brackets [, then there is no error, and the function returns x instead of the intended x[i]:
x = 1:5
i = 1:3
f2 = function(x, i=i) x[i]
f2(x) # No error, but returns x, not x[i]
This is a silent error that leads to subtle bugs. For example, in machine learning we may think we are subsetting the training set (x[i]), but are actually using both the training and testing set (x). The behavior occurs even if the variable i doesn't exist:
x = 1:5
if (exists("i")) rm(i)
f3 = function(x, i=i) x[i]
f3(x) # No error, but returns x, even if i doesnt exist.
A more reasonable behavior is when i is inside double brackets ([[ instead of [), which throws a missing subscript error:
x = 1:5
i = 1:3
f4 = function(x, i=i) x[[i]]
f4(x) # Error: missing subscript
My questions are:
- Is the behavior of
f2andf3intended or is it a bug? - If it is not a bug, then can someone explain the reasoning behind why it is intended? I briefly looked over the R source code for subsetting, but my knowledge of C is not enough to understand the behavior of
f2.
This is an edit to my original answer:
I think it's not a bug, though you can argue that it's a design flaw.
First, there's an easy explanation for the difference between
x[i]andx[[i]]behaviour:x[]is legal, and returnsx.x[[]]is not legal, because it says to extract something, but doesn't say what.Now, why did I say it is not a bug? Take a look at this example, a little simpler than yours, and not using the primitive function
[:Created on 2023-12-05 with reprex v2.0.2
The function
f()tests its argument usingmissing(farg). This doesn't ever evaluate the default value, it just reports on whether the argument was missing or not. Sof()never ends up trying to evaluatefarg.The function
[is likef: if the index is missing, it just returns the whole vector, it doesn't try to evaluate the index. Since it never tries to evaluatei, it doesn't generate an error.New addition:
But this explanation is incomplete. @Roland suggested looking at code like this: (I've changed names to match my example more closely):
Created on 2023-12-05 with reprex v2.0.2
Here
iis missing in the call tog1(), but now it has a default value. So when R tries to evaluatex[i]it will check whetheriis missing. It is evaluated in the context ofg1(), wheremissing(i)would return true, but now we have a default value, so the missingness doesn't propagate, we get the default value 1 substituted foriand end up evaluatingx[1].Now what if the default had been
i = i, as in the original question? Now when determining ifiis missing inx[i], R will substitute the default value and check that. It finds that yes,iis missing, sox[i]returns the same thing asx[].So why is it a design flaw? My first answer is that there's a question of whether missingness should propagate through function calls. "Obviously" the argument is not missing in
f(garg), and yet R sees it as missing becausegargwas missing. You would get the results you expected if missingness depended on the form of the call, not the value in it. But that's not how R works.My second answer is that it's a flaw because propagation of missingness is handled in a pretty subtle way. Arguments with default values are replaced with their default before deciding what to propagate.