How to add name to vector before creating a dataframe of means sorted by group and variable

29 Views Asked by At

Thanks for looking at this!

I want a function to build tables showing stats, such as the mean) for specific variables segrgated into groups.

Below is a start of a function that works up to a point! I use an example using the built in data for mtcars.

MeansbyGroup<-function(var){
  M1<-mtcars %>% group_by(cyl)
  n1=deparse(substitute(var))
  r1<-transpose(M1 %>% summarise(disp=mean(var)))[2,]
}


# EXAMPLE using mtcars

df=MeansbyGroup(mtcars$disp)
df[nrow(df) + 1,] =MeansbyGroup(mtcars$drat)
df

# The above will output
           V1         V2         V3
2   230.721875 230.721875 230.721875
2.1   3.596563   3.596563   3.596563

#which is not even the right means!

#below are the correct values...but I can't automate a table like I want
M1<-mtcars %>% group_by(cyl)
transpose(M1 %>% summarise(disp=mean(disp)))[2,]
transpose(M1 %>% summarise(disp=mean(drat)))[2,]

## Here is my desired output of means disaggregated into columns by the group "cyl"
## if the function worked right with the above example

           V1         V2         V3
disp   105.1364 183.3143 353.1
drat   4.070909 3.585714 3.229286

As you will see, in the function I have "n1=deparse(substitute(var))" to capture the variable name which I would like to have in the first column, instead of 2 and 2.1 as shown in the example output.

I've tried a few techniques, but when I try to add n1 to the vector, it destroys the values of the means!

Also, I'd like to make the function more generalizable. For this example, I'd prefer the function call to look like MeansbyGroup(var,group,dataframe), which in the above example would be called by MeansbyGroup(disp,cyl,mtcars).

Thanks!

1

There are 1 best solutions below

3
Gregor Thomas On

Here's how I would code your table outside of a function:

library(dplyr)
library(tibble)
mtcars %>% 
  group_by(cyl) %>%
  summarize(across(c(disp, drat), mean)) %>%
  column_to_rownames("cyl") %>%
  t
#               4          6          8
# disp 105.136364 183.314286 353.100000
# drat   4.070909   3.585714   3.229286

Using across if you might have multiple variables is quite nice. Putting this inside a function, we will need to use deparse(substitute()) because column_to_rownames requires a string argument for the column. But for the others we can use the friendly {{:

foo = function(data, group, vars) {
  grp_name = deparse(substitute(group))
  data %>% 
    group_by({{group}}) %>%
    summarize(across({{vars}}, mean)) %>%
    column_to_rownames(grp_name) %>%
    t
}

foo(data = mtcars, group = cyl, vars = c(disp, drat))
#               4          6          8
# disp 105.136364 183.314286 353.100000
# drat   4.070909   3.585714   3.229286