Comparing groups when variable has multiple occurences per case

Question

Comparing groups when variable has multiple occurences per case

76 Views Asked by Lauri Pautola At 07 December 2023 at 05:21

First, I admit, I've asked this elsewhere (Crossvalidated), but I guess it is not necessarily the same people reading both forums.

I'm finishing a manuscript, but have one nagging problem which I'm unable to find an answer.

Study has two groups which receive different procedure for a disease. After the procedure, one patient may face multiple different adverse events (complications). I want to test if there is a difference in total number of adverse events between groups.

Obviously it would be all too easy with a simple single categorical variable, but I'm in doubt if using chi-squared or Fisher's test is the right approach. If I'm not mistaken, their presumptions differ as this is not simple on/off per case-situation.

One idea I got was to create a continuous dummy variable for total number of complications per case and compare those with t-test or something, but it still answers a bit different question imho.

The exact problem without reported answer can be found for example from the CHOCOLATE-trial (https://www.bmj.com/content/363/bmj.k3965.long) table 3, last 3 rows.

Here is a reproducible code of the end result.

# Values are strored in vectors where each level means if a complication occured. Level 0 means there was none
# All variables have similar levels
# If variable compl_1 would have level 0 (there was no complication on that case), 
# variables compl_2 and compl_3 would also have in real dataset.
# This is not true for the following reproducible example,
# but it hopefully still delineates the problem.
# Here we have a sample size of 200 cases, 100 for each group

data <- data.frame(group = as.factor(ifelse(runif(200) > 0.5, 0, 1)),
                   compl_1 = c(rep(0, 140), rep(1, 40), rep(2, 14), rep(3,4), rep(4,2)), 
                   compl_2 = c(rep(0, 182), rep(1, 10), rep(2, 7), rep(3,1), rep(4,0)), 
                   compl_3 = c(rep(0, 187), rep(1, 5), rep(2, 7), rep(3,1), rep(4,0)))

# Then we count totals for each complication type by pivoting long, doing the count and pivoting back
table <- data |> 
  pivot_longer(compl_1:compl_3) |> 
  count(group, value) |> 
  pivot_wider(names_from = group, values_from = n, names_sort = T, values_fill = 0)

# Of course count for level 0 is bonkers as there were 100 cases per group
# So this is fixed by replacing it with the count of 0-levels in compl_1
no_complication <- data |> 
  dplyr::select(group, compl_1) |> 
  count(group, compl_1) |> 
  pivot_wider(names_from = group, values_from = n, names_sort = T, values_fill = 0) |> 
  slice_head(n=1)
table[1,] <- no_complication[1,]

# Next we add a row that contains the total number of complication (occurences of levels 1-4 for compl-variables)
complications_total <- table |> 
  slice(-1) |> 
  mutate(`0` = sum(`0`, na.rm = T),
         `1` = sum(`1`, na.rm = T)) |> 
  slice(1)

#To make end result clearer, let's add some names to cases
complications_total$value <- factor(complications_total$value, labels = "Total complications")
table$value <- factor(table$value, labels = c("No complication",
                                              "Head cut",
                                              "Foot off",
                                              "Hand torn",
                                              "Eyes blown"))

#Then we add the row of totals and give names to operations
table <- table |> 
  add_row(complications_total[1,]) |> 
  arrange(desc(`0`)) |> 
  rename(Henry = `0`,
         Martin = `1`)

So in other words, I want to add a new column with p-values and with the correct statistical test. Comparing group "No complications" is obviously easy, but for others I can't come up with anything.

Edit: After some extra thought, I think I've found a solution. Was probably overthinking this. I'll just take case based mean of each complication and compare those. I guess this is just a variant of what Brian suggested, so I'll flag that as the solution. Thanks for both!

Original Q&A

There are 2 best solutions below

Rui Barradas On 07 December 2023 at 07:58

First, I will rewrite the code to table the values. It seems that you are still getting values that add up to greater than 200 cases.

suppressPackageStartupMessages({
  library(tidyverse)
  library(lme4)
})

set.seed(2023)

data <- data.frame(
  group = as.integer(runif(200) > 0.5),
  compl_1 = c(rep(0, 140), rep(1, 40), rep(2, 14), rep(3,4), rep(4,2)), 
  compl_2 = c(rep(0, 182), rep(1, 10), rep(2, 7), rep(3,1), rep(4,0)), 
  compl_3 = c(rep(0, 187), rep(1, 5), rep(2, 7), rep(3,1), rep(4,0))
)

lbl_group <- c("Henry", "Martin")
lbl_complication <- c(
  "No complication", "Head cut", "Foot off", "Hand torn", "Eyes blown"
  # , "Total complications"
)

tbl <- data %>%
  mutate(group = factor(group, labels = lbl_group)) %>%
  table() %>%
  apply(1:2, sum) %>%
  t() %>%
  as.data.frame() %>%
  rownames_to_column(var = "Complication") %>%
  mutate(Complication = factor(Complication, labels = lbl_complication)) %>%
  arrange(desc(Henry))

tbl
#>      Complication Henry Martin
#> 1 No complication    66     74
#> 2        Head cut    26     14
#> 3        Foot off     7      7
#> 4       Hand torn     4      0
#> 5      Eyes blown     2      0

colSums(tbl[-1L])
#> Henry Martin 
#>   105     95

^{Created on 2023-12-07}

As for the question, if you want to analyze differences in complications by individual, an option is a mixed effects model with complication as a regressor and individuals, Person in the code below, as a random effect.
The result of ranef tells you that there are no such differences.
I also include confidence intervals for the fixed effect Complication.

tbl2 <- tbl %>%
  pivot_longer(-Complication, names_to = "Person", values_to = "Count")

mm_person <- lmer(Count ~ Complication + (1|Person), data = tbl2)
#> boundary (singular) fit: see help('isSingular')

# commented out, would clutter the answer's output
# summary(mm_person)
# the following instruction is off-topic
# fixef(mm_person)

# there are no random effects by person
ranef(mm_person)
#> $Person
#>        (Intercept)
#> Henry            0
#> Martin           0
#> 
#> with conditional variances for "Person"

# given the above result, the first 2 rows are
# not interesting, keep the others only
confint(mm_person)
#> Computing profile confidence intervals ...
#>                             2.5 %     97.5 %
#> .sig01                   0.000000   4.627884
#> .sigma                   2.303055   5.638249
#> (Intercept)             64.828860  75.170921
#> ComplicationHead cut   -57.307046 -42.692953
#> ComplicationFoot off   -70.307046 -55.692953
#> ComplicationHand torn  -75.307046 -60.692953
#> ComplicationEyes blown -76.307046 -61.692953

^{Created on 2023-12-07}

**Brian Syzdek** · Accepted Answer · 2023-12-07T06:54:37.650000

As done in the paper, you could calculate the relative risk of a complication in the groups if you just wish to collapse all complications.

library(tidyverse)
library(epitools)
rr_res <- table %>% # From your final 'table' data above
  # Create contingency table, collapsing all complications into one group
  mutate(value = factor(ifelse(value == "No complication", "No complication", "Complication"),
                        # Order for risk of complication
                        levels = c("No complication", "Complication"))) %>% 
  group_by(value) %>% 
  summarise(Henry = sum(Henry), Martin = sum(Martin)) %>% 
  select(-value) %>% 
  as.matrix() %>% 
  t() %>% # Conditions in rows, outcome in columns
  riskratio.wald()
rr_res$measure[2,] %>% round(2)

produces risk ratio with 95%CI of being in Martin group and developing complication.

estimate    lower    upper 
    0.82     0.67     0.99

You might also produce odds-ratio with test.

Comparing groups when variable has multiple occurences per case

There are 2 best solutions below

Related Questions in R

Related Questions in CATEGORICAL-DATA

Trending Questions

Popular # Hahtags

Popular Questions