Creating tables via loop from the levels of one variable

52 Views Asked by At

I'm currently making tables in R and am wondering if there is a way to make my process a lot less tedious. I"m making tables with two variables, however I am going through the levels of the first variable and creating a table for each level based off the second variable. An example of the code is below:

prop.table(table(df[df$v1 == 6 , c("v1" , "v2")]))*100

Variable v1 has 6 levels, i would like to create a loop that would start from 1 up to 6 (all variables are numeric, however not all of them end at 6) and generate tables for each level of v1 to v2.

Variable v2 has two levels, not sure if that is relevant but just in case it is.

Thank you!

EDIT: I've included sample code and desired output tables, apologies for not including these before!

EDIT2: Updated the tables to correctly show the expected outputs. My apologies for all the mistakes while posting this question, next questions will be much better formatted. Thank you everyone for your help!

id <- 1:20
age <- c(1, 2, 2, 3, 1, 1, 4, 5, 6, 6, 5, 4, 4, 4, 2, 1, 1, 2, 3, 1)
gender <- c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 3, 3, 2, 2, 2, 1, 2)
response <- c(1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1)
df <- data.frame(id, age, gender, response)

Here is an example of the table outputs I would like to see from the loop generated. It iterates through each level of the age variable (6 levels) with the response variable. I'd like this loop to apply to the gender variable as well, which has 5 levels.

age response (0) response (1)
1 67% 33%
age response (0) response (1)
2 50% 50%
age response (0) response (1)
3 100% 0%
age response (0) response (1)
4 50% 50%
age response (0) response (1)
5 50% 50%
age response (0) response (1)
6 50% 50%
2

There are 2 best solutions below

1
Jon Spring On BEST ANSWER
library(tidyverse)
df |>
  count(age, response) |>
  complete(age, response, fill = list(n=0)) |>
  mutate(share = n / sum(n), .by = age)

Result

# A tibble: 12 × 4
     age response     n share
   <dbl>    <dbl> <int> <dbl>
 1     1        0     4 0.667
 2     1        1     2 0.333
 3     2        0     2 0.5  
 4     2        1     2 0.5  
 5     3        0     0 0    
 6     3        1     2 1    
 7     4        0     2 0.5  
 8     4        1     2 0.5  
 9     5        0     1 0.5  
10     5        1     1 0.5  
11     6        0     1 0.5  
12     6        1     1 0.5 
0
Adriano Mello On

Please, make sure the expected output is reproducible data as well. You can do it with dput(expected_output) or datapasta::tribble_format(expected_output). Your expected output is inconsistent (as @Friede mentioned) but it's comprehensible enough. This solution isn't concise, but it's easier to follow along and make adjustments (imo).

The input (this is a tidyverse take):

library(tidyverse)

# datapasta::tribble_format()
input <- tibble::tribble(
  ~id, ~age, ~gender, ~response,
  1L,    1,       1,         1,
  2L,    2,       1,         0,
  3L,    2,       1,         1,
  4L,    3,       1,         1,
  5L,    1,       1,         0,
  6L,    1,       1,         0,
  7L,    4,       1,         0,
  8L,    5,       2,         1,
  9L,    6,       2,         1,
  10L,    6,       2,         0,
  11L,    5,       2,         0,
  12L,    4,       1,         1,
  13L,    4,       1,         1,
  14L,    4,       3,         0,
  15L,    2,       3,         0,
  16L,    1,       2,         0,
  17L,    1,       2,         0,
  18L,    2,       2,         1,
  19L,    3,       1,         1,
  20L,    1,       2,         1)

Choose the variables you want to process. You don't need to include them all, and not all of them must be present on input:

variables <- c("age", "gender")

The process have one intermediary step to fill values, combinations etc and is a good place to your adjustments:

aux <- input %>% 
  pivot_longer(any_of(variables), names_to = "variable", values_to = "value") %>% 
  count(variable, value, response) %>% 
  mutate(prop = 100 * prop.table(n), .by = c(variable, value)) %>% 
  complete(nesting(variable, value), response, fill = list(n = 0, prop = 0))

> aux
# A tibble: 18 × 5
   variable value response     n  prop
   <chr>    <dbl>    <dbl> <int> <dbl>
 1 age          1        0     4  66.7
 2 age          1        1     2  33.3
 3 age          2        0     2  50  
 4 age          2        1     2  50  
 5 age          3        0     0   0  
 6 age          3        1     2 100  
 7 age          4        0     2  50  
 8 age          4        1     2  50  
 9 age          5        0     1  50  
10 age          5        1     1  50  
11 age          6        0     1  50  
12 age          6        1     1  50  
13 gender       1        0     4  40  
14 gender       1        1     6  60  
15 gender       2        0     4  50  
16 gender       2        1     4  50  
17 gender       3        0     2 100  
18 gender       3        1     0   0 

The output will merge the total observations by variable-value with proportions calculated in a wide format:

output <- left_join(
  summarise(aux, .by = c(variable, value), n = sum(n)),
  pivot_wider(aux, id_cols = -n, names_from = response, names_glue = "response_{response}", values_from = prop),
  by = c("variable", "value"))

> output
# A tibble: 9 × 5
  variable value     n response_0 response_1
  <chr>    <dbl> <int>      <dbl>      <dbl>
1 age          1     6       66.7       33.3
2 age          2     4       50         50  
3 age          3     2        0        100  
4 age          4     4       50         50  
5 age          5     2       50         50  
6 age          6     2       50         50  
7 gender       1    10       40         60  
8 gender       2     8       50         50  
9 gender       3     2      100          0

Discard the intermediary helper:

rm(aux)

Hope it helps.