Make R error out when accessing undefined columns in dataframe

47 Views Asked by At

This site has lots of questions on how to fix an "undefined column" error.

I have the exact opposite question: how to make an "undefined column" error.

I frequently change variable names in my files.

This leads to the following error:

r$> df <- data.frame(gender=c(1,1,NA,0))
r$> sum(is.na(df$male))
[1] 0

when the correct result is 1.

I want R to print an error message if the column I'm trying to access is undefined.

Not to silently fail.

How can I do that?

2

There are 2 best solutions below

1
Konrad Rudolph On BEST ANSWER

Unfortunately R is rather too lenient when it comes to such matters. The $ operator for data.frames is defined to allow accessing non-existent columns and to return NULL in that case.

There are alternative data.frame implementations which are a bit stricter. Notably, the tbl_df data structure used by the Tidyverse packages ‘tibble’, ‘dplyr’, etc. will at least show you a warning:

df <- tibble::tibble(gender = c(1, 1, NA, 0))
sum(is.na(df$male))
# [1] 0
# Warning message:
# Unknown or uninitialised column: `male`.

Alternatively, you can make this a hard error for data.frames by overriding $ for data.frames:

registerS3method(
  '$', 'tbl_df',
  \(x, name) {
    stopifnot(name %in% colnames(x))
    NextMethod('$')
  }
)

However, note that this will only apply to plain data.frame, not to tibbles, since the latter also override $. There does not seem to be an option to make this a hard error for tibbles (short of making all warnings into errors); this might be a nice feature request for the package (alternatively, you can make the above code apply to tibbles by replacing 'data.frame' with 'tbl_df).

4
Alessio On

Did you mean

sum(is.na(df$gender))

because you did not have create the column "male". For the correct code the result for the column "gender" is 1. Moreover, in writing code, "r$>" is not necessary.