Repeated measurement modelling with MIXED command in SPSS - REML does not work for missing data?

36 Views Asked by At

I have a dataset with sociodemographic variables (time invariant: sex, age, *country *[austria/italy]; and time varying independent variable: coping behavior (14 sub-scales) from 820 individuals. The dependend variable is psychological distress. *Measurements *were taken three times, at baseline 650 participants, follow-up one 600 participants and follow-up two 550 participants. The first two measurements differed between the two countries by several weeks, the last measurement was simultaneous. For the sake of simplicity, I assume the same measurement periods for both countries and only divide them into measurements 1 to 3.

Hypotheses:

  1. I have the hypothesis, that psychological distress does differ between measurements 2.1 Furthermore, I assume, that coping behavior does also change between measurements 2.2 and coping behavior "influences" psychological distress to a different degree, depending on timepoint

Missing observations: There are 81 missing entries in the psychological distress variable (dependent) and 99 missing entries in the coping behavior variable (independent). Combined, there are 114 missing entries in total.

Model building: I am trying to model a linear mixed model / repeated measures experiment in SPSS with the MIXED command. However, I am new to mixed models and I am confused as to whether I am using the correct syntax to model what I am trying to model:

1. Initially I specified a Null-(Unconditional) Model to calculate the *ICC *and test if a Mixed model is appropirate:

MIXED Distress WITH measurement
  /CRITERIA=CIN(95) MXITER(1000) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, 
    ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
  /FIXED= measurement | SSTYPE(3)
  /METHOD=REML
  /PRINT= G R SOLUTION TESTCOV
  /RANDOM= INTERCEPT measurement| SUBJECT(participant) COVTYPE(UN).

Then I continue with calculating the ICC by dividing the Level-1 error (measurement = 0.066) by the sum of Level-1 error and Level-2 error (participant = 0.262 + measurement = 0.066) and get an of ICC=0.80 If I interprete this correctly it means, that 80% of the variance in Distress is due to differences in each individual. The very low number in Level-1 error means, that Distress is relativley stable over time.

Next, the significant (p < 0.05) estimated variance UN(1,1) tells me, that participants differ between each other regarding Distress. UN (2,1) is not significant. UN (2,2) is not significant, but would tell me if each participant differs between individual measurements.

2. Next, I am trying to answer hypotheses 1.

MIXED Distress BY measurement
  /CRITERIA=CIN(95) MXITER(1000) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, 
    ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
  /FIXED=measurement| SSTYPE(3)
  /METHOD=REML
  /PRINT=G  R SOLUTION TESTCOV
  /RANDOM=INTERCEPT | SUBJECT(participant) COVTYPE(UN)
  /REPEATED=measurement| SUBJECT(participant) COVTYPE(AR1).

Results tell me, that compared to the third measurement, the first measurment of *Distress *is significantly different.

3. To test the hypotheses 2.1 I have to build the interaction term between measurement x coping

MIXED Distress BY measurement WITH coping
  /CRITERIA=CIN(95) MXITER(1000) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, 
    ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
  /FIXED=measurement coping measurement*coping| SSTYPE(3)
  /METHOD=REML
  /PRINT=G  R SOLUTION TESTCOV
  /RANDOM=INTERCEPT | SUBJECT(participant) COVTYPE(UN)
  /REPEATED=measurement| SUBJECT(participant) COVTYPE(AR1).

4.

I've read, that a big advantage of linear mixed models is the capability to deal with missing data. Instead of deleting complete cases, mixed models are capable of "estimating" the variance structure of missing data. I tried to figure out how this workes. As mentioned at the beginning, there are 114 entries missing in total. Added up observations across all three measurements: 650 + 600 + 550 = 1800 So there are 1800 - 114 = 1686 complete observations.

When using the command above, SPSS gives me an output for 1686 observations. After trying to force SPSS to estimate the missing observations with the subcommand /MISSING=INCLUDE, SPSS tells me that it now uses 1800 observations, but the estimates and variances remain exactly the same?

Questions:

  1. Is my modelling technique correct?
  2. How can I test hypotheses 2.2?
  3. What is the difference between COVTYPE(UN) and COVTYPE(VC)?
  4. Is my assumption of COVTYPE (AR1) under the /REPEATED subcommand correkt? How do I test this?
  5. Concering point 4., what am I doing wrong with the missing command? How do I SPSS get to estimate the missing values?

Thank you very much!

I read the IBM SPSS documentation, comments on this topic and tried to figure it out by myself with guides and videos.

0

There are 0 best solutions below