Probably this is an intended behaviour, but I had not noticed it before. If you have several columns you want to change at once with dplyr's across. And you want to refer to the colums by their index - this index changes if you group the dataframe by a variable.
What I mean is the following:
lets say we have this dataframe
df = data.frame(
group = c("a", "a", "b"),
val1 = 1:3,
val2 = 2:4,
val3 = 3:5
)
df_grouped = df %>%
group_by(group)
And then we want to change columns 2 to 4 (val1 to val3) we can do this:
df %>%
mutate(across(2:4, ~"changed"))
and the result is:
group val1 val2 val3
1 a changed changed changed
2 a changed changed changed
3 b changed changed changed
however, when I do the same on the grouped dataframe I get this:
Error in `mutate()`:
ℹ In argument: `across(2:4, ~"changed")`.
Caused by error in `across()`:
! Can't subset columns past the end.
ℹ Location 4 doesn't exist.
ℹ There are only 3 columns.
So I have to do this
df_grouped %>%
mutate(across(1:3, ~"changed"))
So as I can tell it just takes the grouping-column out. Is there any way to prevent that?
What you're seeing is the consequence of using integers for column indices. Though that method works on ungrouped data, it's not mentioned in the
?group_byor in the vignettes. The references to using
group_byuse variables or computations to do what you need.The reason is because the grouping is assumed to be on mutable columns of the frame, and in
df_group, thegroupcolumn is immutable because it's the current group. As a demonstration, we can know what is the apparent number of columns in data with:Because of this, your use of
2:4is beyond the number of columns.Ways to deal with this:
use variable names, as in
ungroupbefore you mutate, which works here because your operation is fairly benign, but realize that you're losing the "by-group" logic of whatever you're really doing:A small bit of a hack, but if you really don't want to code in the column names, you can do