Ensuring n<5 cells in gtable do not provide results

Question

Ensuring n<5 cells in gtable do not provide results

69 Views Asked by Stats Iliterate At 19 August 2023 at 19:42

I hope someone can help me with this.

In the example dataset below, participants are divided into Group A and B. The objective is to ensure that when the gtable is produced, when the n is below 5, there will be a blank cell. In this example, contrary to what the code below shows, the objective would be for all ethnicity groups characterising females in group A to be ommitted (because their n is 3 or 0), for the ethinicity group B, corresponding to males in group A to be ommitted (n = 3), etc.

The overall purpose is to implement what is called a statistical disclosure control, whereby if the n in a cell is below a number, then the results for that cell are not disclosed, to avoid potential identification of the participants.

In this example the columns for gender across group A and B all have n>5, namely 9, 27, 15, and 27. The objective would also be to have the entire column not have data if the n was below 5 (same rationale as the one applied on a per cell basis).

Any help would be much appreciated, thank you

Group <- c("A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
               "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
           "A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
           "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
           "A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
           "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")
Sex <- c("M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
           "M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
         "M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
         "M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
         "M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
         "M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F")
Height <- c(170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182, 
            181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
            170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182, 
            181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
            170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182, 
            181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182)
Ethnicity <- c("A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
              "E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
              "A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
              "E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
              "A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
              "E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F")

df <- data.frame(Group, Sex, Height, Ethnicity)

df %>%
  tbl_strata(strata = Group,
             .tbl_fun = ~ .x %>%
               tbl_summary(by = Sex,
                           statistic = list(all_continuous() ~ "{mean} ({sd})",
                                            all_categorical() ~ "{n} / {N} ({p}%)"),
                           digits = all_continuous() ~ 1,
                           missing_text = "(Missing Data)")) %>%
  modify_header(label ~ "**Relevant Data**") %>%
  modify_caption("**Table 1. Key descriptives**") %>%
  bold_labels()

^{Created on 2023-08-19 with reprex v2.0.2}

Original Q&A

There are 1 best solutions below

**Ben** · Accepted Answer · 2023-08-20T15:25:18.673000

Using gtsummary you can modify the table body, and perform your own operations to substitute empty cell data depending on values.

For example, if you want to look at the table body, try:

tab <- df %>%
  tbl_strata(strata = Group,
             .tbl_fun = ~ .x %>%
               tbl_summary(by = Sex,
                           statistic = list(all_continuous() ~ "{mean} ({sd})",
                                            all_categorical() ~ "{n} / {N} ({p}%)"),
                           digits = all_continuous() ~ 1,
                           missing_text = "(Missing Data)"))

And then:

tab$table_body

You will see stat_1_1, stat_1_2, etc. as columns to modify:

  variable  var_label row_type label     var_type_1  stat_1_1     stat_2_1      var_type_2  stat_1_2     stat_2_2     
  <chr>     <chr>     <chr>    <chr>     <chr>       <chr>        <chr>         <chr>       <chr>        <chr>        
1 Height    Height    label    Height    continuous  188.3 (10.0) 173.7 (9.3)   continuous  175.8 (8.4)  177.0 (10.6) 
2 Ethnicity Ethnicity label    Ethnicity categorical NA           NA            categorical NA           NA           
3 Ethnicity Ethnicity level    A         categorical 3 / 9 (33%)  6 / 27 (22%)  categorical 6 / 15 (40%) 0 / 27 (0%)  
4 Ethnicity Ethnicity level    B         categorical 3 / 9 (33%)  3 / 27 (11%)  NA          NA           NA           
5 Ethnicity Ethnicity level    C         categorical 3 / 9 (33%)  12 / 27 (44%) categorical 3 / 15 (20%) 0 / 27 (0%)  
6 Ethnicity Ethnicity level    D         categorical 0 / 9 (0%)   6 / 27 (22%)  categorical 3 / 15 (20%) 9 / 27 (33%) 
7 Ethnicity Ethnicity level    E         NA          NA           NA            categorical 0 / 15 (0%)  15 / 27 (56%)
8 Ethnicity Ethnicity level    F         NA          NA           NA            categorical 3 / 15 (20%) 3 / 27 (11%)

Using modify_table_body can you mutate across those columns, and check if the value is less than 5.

While there are multiple options to extract the number from a character value (in this case, with '/' and a percentage in parentheses), if we make the assumption you want the first number (which is 'n'), you can use parse_number from readr.

Here is a complete example:

library(gtsummary)
library(tidyverse)

df %>%
  tbl_strata(strata = Group,
             .tbl_fun = ~ .x %>%
               tbl_summary(by = Sex,
                           statistic = list(all_continuous() ~ "{mean} ({sd})",
                                            all_categorical() ~ "{n} / {N} ({p}%)"),
                           digits = all_continuous() ~ 1,
                           missing_text = "(Missing Data)")) %>%
  modify_header(label ~ "**Relevant Data**") %>%
  modify_caption("**Table 1. Key descriptives**") %>%
  modify_table_body(~ .x %>% 
                      mutate(across(starts_with("stat_"), 
                                    ~ifelse(parse_number(.) < 5, 
                                            "", 
                                            .)))) %>% 
  bold_labels()

Table

Ensuring n<5 cells in gtable do not provide results

There are 1 best solutions below

Related Questions in R

Related Questions in TIDYVERSE

Related Questions in GTSUMMARY

Related Questions in GTABLE

Trending Questions

Popular # Hahtags

Popular Questions