Error in bootstrap function in R data is constant, returning NAs

27 Views Asked by At

I'm currently working on a bootstrap function in R using the 'boot' package for statistical analysis. My aim is to investigate whether there is a statistically significant difference in the divergence points between eye fixations (binomial data, 0, 1) on various target objects versus competitor objects presented on the screen over time. Here's the structure of my code:

# packages I'm using in my script
library(viridisLite)
library(boot)
library(polycor)
library(mgcv)
library(Rmisc)
library(ggplot2)
library(dplyr)
library(tidyr)

# bootstrap function
boot_alttime <- function(original_data, resample_indices) {
  
  dat_resample <- original_data[resample_indices,]
  
  # prepare the resample data for testing

  dat_asso <- dat_resample %>%  # associative effect
    # keep only rows where either the target or competitor were fixated
    filter(fixated == 1 & condition %in% c("Agent Consistent Target","Agent Consistent Distractor")) %>%
    # create a new variable indicated whether or not the target was fixated
    mutate(pTarget = ifelse(condition == "Agent Consistent Target", 1, 0)) %>%
    # average fixation proportions by participant and time, keeping speaker group
    dplyr::group_by(Participant, Time) %>%
    dplyr::summarise(MeanFixation = mean(pTarget)) 
  
  dat_con <- dat_resample %>% # strategic effect
    filter(fixated == 1 & condition %in% c("Agent Consistent Target","Agent Inconsistent Target")) %>%
    # create a new variable indicated whether or not the target was fixated
    mutate(pTarget = ifelse(condition == "Agent Consistent Target", 1, 0)) %>%
    # average fixation proportions by participant and time, keeping speaker group
    dplyr::group_by(Participant, Time) %>%
    dplyr::summarise(MeanFixation = mean(pTarget)) 
  
  # apply at a statistical test at each timepoint for each group 
  # test for associative effect 
  test_asso <- dat_asso %>%
    dplyr::group_by(Time) %>%
    dplyr::summarise(t=t.test(MeanFixation, mu = .5)$statistic[[1]])
  
  # test for strategic effect
  test_con <- dat_con %>%
    dplyr::group_by(Time) %>%
    dplyr::summarise(t=t.test(MeanFixation, mu = .5)$statistic[[1]])
  
  # return a TRUE/FALSE vector of significant positive t-scores (positive means more looks to the target than competitor)
  t_asso <- test_asso$t > 1.96
  t_con <- test_con$t > 1.96
  
  # create empty vectors to store onsets
  onset_asso <- onset_con <- c()
  
  # find the index of the earliest run of 10 sequential TRUEs 
  for (i in 1:(length(t_asso)-10)) { 
    onset_asso[i] <- sum(t_asso[i:(i+9)]) == 10
    onset_con[i] <- sum(t_con[i:(i+9)]) == 10
  }
  
  # find the difference between onsets
  delta_assocon <- which(onset_con)[1] - which(onset_asso)[1]
  
    # print 
  # note: the bootstrap returns the indices of the respective timepoints, not absolute times. The annotations to the right of each index (e.g. t[,1]) indicate where in the boot object the bootstrapped onset distributions can be found.
  c(
    delta_assocon,         
    which(onset_asso)[1], 
    which(onset_con)[1]  
    )
}

# calling the bootstrap function
CogLoad_bootres_alttime <- boot::boot(
  # dataset to bootstrap
  data = CogLoad_dat_gen_boot,      
  # bootstrap function                          
  statistic = boot_alttime,       
  # stratification variable                          
  strata = CogLoad_dat_gen_boot$StrataVars, 
  # number of iterations                          
  R = Niter) # where Niter is 2000

Description of Data:

My data consists of observations from an experiment involving participants' eye fixations while listening to sentences and looking at objects on the screen. Here's a brief overview of the columns in the (.csv) dataset:

Participant: Participant ID
Participant.Gender: Gender of the participant
Time: Time in milliseconds
Trial: Trial number
Item: Item number
sentence: Sentence text
sentencetype: Type of sentence
speakergender: Gender of the speaker
verboffset: Verb offset
verbonset: Verb onset
Audio.Gender.Consistency: Consistent if the participant gender = speaker gender, otherwise Inconsistent
condition: Experimental condition
fixated: 0 if not fixated or 1 if fixated
StrataVars: Variables used for stratified bootstrap

Here's a preview of a random sample of 10 rows of my dataset:

Participant Participant.Gender Time Trial Item sentence sentencetype speakergender verboffset verbonset Audio.Gender.Consistency condition fixated StrataVars
5M1 Male 300 46 9 I used to dream of becoming a great plumber Gendered Male 1265 905 Consistent Agent Consistent Target 1 5M1 Agent Consistent Target 300
10F2 Female 1450 37 6 Later, I am going to use the new urinal Gendered Male.png 1700 1275 Inconsistent Agent Consistent Target 0 10F2 Agent Consistent Target 1450
1F1P Female 100 16 7 I really wanted to become a good princess Gendered Female 1587 1112 Consistent Agent Inconsistent Target 0 1F1P Agent Inconsistent Target 100
3F3 Female 800 19 21 I have decided to buy a nice necklace Gendered Female 1523 1206 Consistent Agent Inconsistent Distractor 1 3F3 Agent Inconsistent Distractor 800
4F4 Female 550 21 27 I have decided to wear the new perfume Gendered Female 1729 1373 Consistent Agent Inconsistent Target 0 4F4 Agent Inconsistent Target 550
11F3 Female 900 59 25 I have decided to use the nice chainsaw Gendered Male 1626 1135 Inconsistent Agent Consistent Target 0 11F3 Agent Consistent Target 900

Here's what I've tried so far to debug the issue:

  • Added data constant check: I integrated an if_else statement with 'summarise' in the bootstrap function (boot_alttime) to ensure that the data is not constant within each resampled group. This check was intended to address the possibility of constant data leading to NA values in the statistics
  • Checked dplyr incompatibilities with other packages I'm using
  • Checked variable types: I changed some variable types to see if that would change anything. Stratavars and Time need to be factors, fixated needs to be numeric for the t-tests
  • Changed the number of iterations for bootstrapping
# Example of code snippet added for the constant check
summarise(
  t = if_else(length(unique(MeanFixation)) == 1, NA_real_, t.test(MeanFixation, mu = 0.5)$statistic[[1]])
)

Despite these checks, the issue with NA values persists. I suspect this might be due to the lack of variability in my data, which could be problematic for either the resampling procedure or the statistical tests being applied within the boot_alttime function. However, I'm having trouble pinpointing the exact source of the problem.

Could someone please review my code and offer insights into why the bootstrap function is returning NAs? Any suggestions on how to troubleshoot and resolve this problem would be greatly appreciated. Thank you in advance!

0

There are 0 best solutions below