Rules on when to use quotes and when not to in R

51 Views Asked by At

When starting out learning R, it can be very confusing in which cases you have to put something in quotes / quotation marks and when you don't.

For example, when calling the column mpg from mtcars, I use quotes when using the [ notation but omit them when using the $. The output however is identical.

unquoted <- mtcars$mpg

quoted <- mtcars[,"mpg"]

identical(unquoted,quoted)
#> [1] TRUE

It get's even more complicated when I use tidyverse functions. Are there some general rules on when to use quotes and when not to?

PS: I'm asking this question on behalf of many confused students.

2

There are 2 best solutions below

0
Ratnanil On

Here are some general rules from the top of my head:

When to use quotes

  • Filenames
  • Arguments to functions
  • Colors
  • Column Names
    • if used within [
    • if used as arguments within functions which are not part of the tidyverse (see below)
  • library calls:
    • installation: install.packges("mypackage")
    • importing: library("mypackage")(here, you don't have to quote)

When not to use quotes

  • Objects in your environment
  • Numbers, if you want to treat them as such
  • TRUE/FALSE
  • Column Names
    • if used within tidyverse functions (e.g. group_by, mutate, summarise, ggplot, pivot_*, *_join)
    • if used with the $ sign (e.g. df$mycolumn. However, you could use quotes)
8
user2554330 On

Putting quotes on something makes it into a character constant. Leaving them off means the object will be taken to be an expression which could be evaluated.

The tricky thing is that R is very flexible, and the functions you call can ask it to treat character constants as expressions and then evaluate them, or they can convert expressions to character constants without ever evaluating them.

For your examples:

In mtcars$mpg, the mpg is given as a name (i.e. a simple expression), but when evaluating mtcars$mpg R converts it to a character value and looks up the column with that name. You could have done the conversion yourself, i.e. mtcars$"mpg" is fine and gives you the same thing, it's just extra typing.

In mtcars[,"mpg"], you are explicitly giving the name of the column as a character constant. If you had written mtcars[, mpg] then R would assume you wanted to evaluate mpg and use its value to select a column. This is inconsistent with the previous syntax, because [ is different from $. You just need to remember that.

Many tidyverse functions use more complicated rules to change how R evaluates things. They push the R flexibility a lot. The rules they use are consistent within tidyverse functions, but won't be consistent with other R functions.

So what is the general rule? There isn't one. The only advice I can think of is "read the help page for the function you are using." And if you write your own functions, don't be too clever: have pity on the poor user.