Consequences of syntactically invalid names

681 Views Asked by At

The read.table family (read.table, read.csv, read.delim et al) has the argument check.names with the following explanation:

logical. If TRUE then the names of the variables in the data frame are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by make.names) so that they are, and also to ensure that there are no duplicates.

Say I have loaded a data frame containing syntactically invalid column names. Is there any other consequence apart from having to access a specific column by name using the ` character?

1

There are 1 best solutions below

0
On

Check out help(make.names) to understand what it is doing and why.

A syntactically valid name consists of letters, numbers and the dot or underline characters and starts with a letter or the dot not followed by a number. Names such as ".2way" are not valid, and neither are the reserved words.

The definition of a letter depends on the current locale, but only ASCII digits are considered to be digits.

The character "X" is prepended if necessary. All invalid characters are translated to ".". A missing value is translated to "NA". Names which match R keywords have a dot appended to them. Duplicated values are altered by make.unique.

The big ones that will trip you up are blank column names (df$`` gives an error) and repeated column names (df$val will return the first val column result only).

Outside of that, if you pass this data.frame to a function that is expecting a data.frame with valid names, you will likely get errors, and perhaps silent ones that are hard to detect.