Constrained baseline longitudinal model in R

46 Views Asked by At

I have a longitudinal dataset for a randomised controlled trial, where there are two arms (arm), and an outcome (Y) assessed at three timepoints (period). The first period is 0, and is the baseline score, recorded before randomisation.

I have simulated a dataset with:

library(simstudy)
set.seed(420)

data <- defData(varname = "arm", dist = "binary", formula = 0.5)
data <- defData(data, varname = "Y0", dist = "normal", formula = 10, variance = 2)
data <- defData(data, varname = "Y1", dist = "normal", formula = "Y0 + 5 + 5 * arm", variance = 2)
data <- defData(data, varname = "Y2", dist = "normal", formula = "Y0 + 10 + 5 * arm", variance = 2)
data <- genData(500, data)
data <- addPeriods(data, nPeriods = 3, idvars = "id", timevars = c("Y0", "Y1", "Y2"), timevarName = "Y")

data <- data %>%
  mutate(
    arm = as.factor(arm),
    period = as.factor(period)
  )

The summary stats for the score at each timepoint and within each arm are:

data %>%
  group_by(arm, period) %>%
  summarise(mean = mean(Y))

# A tibble: 6 × 3
# Groups:   arm [2]
  arm   period  mean
  <fct> <fct>  <dbl>
1 0     0       10.1
2 0     1       14.9
3 0     2       20.2
4 1     0       10.2
5 1     1       20.2
6 1     2       25.2

I wish to model the mean difference at each timepoint using a constrained baseline model (https://bmjopen.bmj.com/content/6/12/e013096). Simply, this fits a regression model with a fixed effect for time period and the interaction between time period and treatment group (no main effect for treatment). This has the effect of 'constraining' the baseline, or assuming that it is from the same distribution (a fair assumption in most RCTs). Including the main effect allows the baseline to differ between the groups, which is not desired in this case.

I have attempted to fit this model using the lme4 package:

model1 <- lmer(Y ~ period + period:arm + (1 | id), data = data)

However, when fitting the emmeans, I have noticed that the baseline scored (which should be identical) are instead equal to the group specific means:

emmeans(model1, ~ arm | period)

period = 0:
 arm emmean    SE  df lower.CL upper.CL
 0     10.1 0.113 943     9.83     10.3
 1     10.2 0.113 943     9.97     10.4

period = 1:
 arm emmean    SE  df lower.CL upper.CL
 0     14.9 0.113 943    14.71     15.1
 1     20.2 0.113 943    20.01     20.5

period = 2:
 arm emmean    SE  df lower.CL upper.CL
 0     20.2 0.113 943    19.94     20.4
 1     25.2 0.113 943    25.01     25.5

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

This has happened because the formula that was specified in the model has expanded out to the interactions at each timepoint (including baseline):

colnames(model.matrix(model1))

[1] "(Intercept)"  "period1"      "period2"      "period0:arm1" "period1:arm1" "period2:arm1"

I was expecting to see:

[1] "(Intercept)"  "period1"      "period2"      "period1:arm1" "period2:arm1"

(with no period0:arm1 term).

My question is: how can I modify this model so that it is actually a constrained baseline model?

1

There are 1 best solutions below

0
kaz7 On

Your period variable is a factor, so when you model it in lme4 it is dummy-coded, thus why interactions are included for period0 in addition to period1 and period2. As far as I'm aware, you can't ask lme4 to only include certain values of a variable. You would have to cull your data to exclude period0 and run on just that data for your baseline model.

baseline_data <- data[data$period != 0, ] # filter out period0

baseline_model <- lmer(Y ~ period + period:arm + (1 | id), data = baseline_data)

In this case, when you run colnames(model.matrix(baseline_model)) you won't have period1 in the list because the intercept is period1 (where before the intercept was period0).