I hope someone can help me with this.
In the example dataset below, participants are divided into Group A and B. The objective is to ensure that when the gtable is produced, when the n is below 5, there will be a blank cell. In this example, contrary to what the code below shows, the objective would be for all ethnicity groups characterising females in group A to be ommitted (because their n is 3 or 0), for the ethinicity group B, corresponding to males in group A to be ommitted (n = 3), etc.
The overall purpose is to implement what is called a statistical disclosure control, whereby if the n in a cell is below a number, then the results for that cell are not disclosed, to avoid potential identification of the participants.
In this example the columns for gender across group A and B all have n>5, namely 9, 27, 15, and 27. The objective would also be to have the entire column not have data if the n was below 5 (same rationale as the one applied on a per cell basis).
Any help would be much appreciated, thank you
Group <- c("A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B",
"A", "B", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B")
Sex <- c("M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
"M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
"M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
"M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F",
"M", "F", "F", "M", "F", "M", "M", "M", "M", "F", "M", "M", "M",
"M", "M", "F", "M", "M", "M", "M", "M", "F", "M", "F", "M", "F")
Height <- c(170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182,
181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182,
181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182,
170, 181, 190, 183, 199, 165, 155, 170, 185, 176, 176, 177, 182,
181, 164, 165, 171, 181, 201, 171, 173, 167, 168, 184, 183, 182)
Ethnicity <- c("A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
"E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
"A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
"E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F",
"A", "A", "B", "A", "A", "C", "C", "B", "C", "C", "C", "D", "D",
"E", "E", "D", "D", "D", "E", "E", "E", "A", "F", "C", "D", "F")
df <- data.frame(Group, Sex, Height, Ethnicity)
df %>%
tbl_strata(strata = Group,
.tbl_fun = ~ .x %>%
tbl_summary(by = Sex,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 1,
missing_text = "(Missing Data)")) %>%
modify_header(label ~ "**Relevant Data**") %>%
modify_caption("**Table 1. Key descriptives**") %>%
bold_labels()
Created on 2023-08-19 with reprex v2.0.2
Using
gtsummaryyou can modify the table body, and perform your own operations to substitute empty cell data depending on values.For example, if you want to look at the table body, try:
And then:
You will see
stat_1_1,stat_1_2, etc. as columns to modify:Using
modify_table_bodycan youmutateacrossthose columns, and check if the value is less than 5.While there are multiple options to extract the number from a character value (in this case, with '/' and a percentage in parentheses), if we make the assumption you want the first number (which is 'n'), you can use
parse_numberfromreadr.Here is a complete example:
Table