Error in bootstrap function in R data is constant, returning NAs

Question

Error in bootstrap function in R data is constant, returning NAs

27 Views Asked by Naomi At 19 March 2024 at 18:13

I'm currently working on a bootstrap function in R using the 'boot' package for statistical analysis. My aim is to investigate whether there is a statistically significant difference in the divergence points between eye fixations (binomial data, 0, 1) on various target objects versus competitor objects presented on the screen over time. Here's the structure of my code:

# packages I'm using in my script
library(viridisLite)
library(boot)
library(polycor)
library(mgcv)
library(Rmisc)
library(ggplot2)
library(dplyr)
library(tidyr)

# bootstrap function
boot_alttime <- function(original_data, resample_indices) {
  
  dat_resample <- original_data[resample_indices,]
  
  # prepare the resample data for testing

  dat_asso <- dat_resample %>%  # associative effect
    # keep only rows where either the target or competitor were fixated
    filter(fixated == 1 & condition %in% c("Agent Consistent Target","Agent Consistent Distractor")) %>%
    # create a new variable indicated whether or not the target was fixated
    mutate(pTarget = ifelse(condition == "Agent Consistent Target", 1, 0)) %>%
    # average fixation proportions by participant and time, keeping speaker group
    dplyr::group_by(Participant, Time) %>%
    dplyr::summarise(MeanFixation = mean(pTarget)) 
  
  dat_con <- dat_resample %>% # strategic effect
    filter(fixated == 1 & condition %in% c("Agent Consistent Target","Agent Inconsistent Target")) %>%
    # create a new variable indicated whether or not the target was fixated
    mutate(pTarget = ifelse(condition == "Agent Consistent Target", 1, 0)) %>%
    # average fixation proportions by participant and time, keeping speaker group
    dplyr::group_by(Participant, Time) %>%
    dplyr::summarise(MeanFixation = mean(pTarget)) 
  
  # apply at a statistical test at each timepoint for each group 
  # test for associative effect 
  test_asso <- dat_asso %>%
    dplyr::group_by(Time) %>%
    dplyr::summarise(t=t.test(MeanFixation, mu = .5)$statistic[[1]])
  
  # test for strategic effect
  test_con <- dat_con %>%
    dplyr::group_by(Time) %>%
    dplyr::summarise(t=t.test(MeanFixation, mu = .5)$statistic[[1]])
  
  # return a TRUE/FALSE vector of significant positive t-scores (positive means more looks to the target than competitor)
  t_asso <- test_asso$t > 1.96
  t_con <- test_con$t > 1.96
  
  # create empty vectors to store onsets
  onset_asso <- onset_con <- c()
  
  # find the index of the earliest run of 10 sequential TRUEs 
  for (i in 1:(length(t_asso)-10)) { 
    onset_asso[i] <- sum(t_asso[i:(i+9)]) == 10
    onset_con[i] <- sum(t_con[i:(i+9)]) == 10
  }
  
  # find the difference between onsets
  delta_assocon <- which(onset_con)[1] - which(onset_asso)[1]
  
    # print 
  # note: the bootstrap returns the indices of the respective timepoints, not absolute times. The annotations to the right of each index (e.g. t[,1]) indicate where in the boot object the bootstrapped onset distributions can be found.
  c(
    delta_assocon,         
    which(onset_asso)[1], 
    which(onset_con)[1]  
    )
}

# calling the bootstrap function
CogLoad_bootres_alttime <- boot::boot(
  # dataset to bootstrap
  data = CogLoad_dat_gen_boot,      
  # bootstrap function                          
  statistic = boot_alttime,       
  # stratification variable                          
  strata = CogLoad_dat_gen_boot$StrataVars, 
  # number of iterations                          
  R = Niter) # where Niter is 2000

Description of Data:

My data consists of observations from an experiment involving participants' eye fixations while listening to sentences and looking at objects on the screen. Here's a brief overview of the columns in the (.csv) dataset:

Participant: Participant ID
Participant.Gender: Gender of the participant
Time: Time in milliseconds
Trial: Trial number
Item: Item number
sentence: Sentence text
sentencetype: Type of sentence
speakergender: Gender of the speaker
verboffset: Verb offset
verbonset: Verb onset
Audio.Gender.Consistency: Consistent if the participant gender = speaker gender, otherwise Inconsistent
condition: Experimental condition
fixated: 0 if not fixated or 1 if fixated
StrataVars: Variables used for stratified bootstrap

Here's a preview of a random sample of 10 rows of my dataset:

Participant	Participant.Gender	Time	Trial	Item	sentence	sentencetype	speakergender	verboffset	verbonset	Audio.Gender.Consistency	condition	fixated	StrataVars
5M1	Male	300	46	9	I used to dream of becoming a great plumber	Gendered	Male	1265	905	Consistent	Agent Consistent Target	1	5M1 Agent Consistent Target 300
10F2	Female	1450	37	6	Later, I am going to use the new urinal	Gendered	Male.png	1700	1275	Inconsistent	Agent Consistent Target	0	10F2 Agent Consistent Target 1450
1F1P	Female	100	16	7	I really wanted to become a good princess	Gendered	Female	1587	1112	Consistent	Agent Inconsistent Target	0	1F1P Agent Inconsistent Target 100
3F3	Female	800	19	21	I have decided to buy a nice necklace	Gendered	Female	1523	1206	Consistent	Agent Inconsistent Distractor	1	3F3 Agent Inconsistent Distractor 800
4F4	Female	550	21	27	I have decided to wear the new perfume	Gendered	Female	1729	1373	Consistent	Agent Inconsistent Target	0	4F4 Agent Inconsistent Target 550
11F3	Female	900	59	25	I have decided to use the nice chainsaw	Gendered	Male	1626	1135	Inconsistent	Agent Consistent Target	0	11F3 Agent Consistent Target 900

Here's what I've tried so far to debug the issue:

Added data constant check: I integrated an if_else statement with 'summarise' in the bootstrap function (boot_alttime) to ensure that the data is not constant within each resampled group. This check was intended to address the possibility of constant data leading to NA values in the statistics
Checked dplyr incompatibilities with other packages I'm using
Checked variable types: I changed some variable types to see if that would change anything. Stratavars and Time need to be factors, fixated needs to be numeric for the t-tests
Changed the number of iterations for bootstrapping

# Example of code snippet added for the constant check
summarise(
  t = if_else(length(unique(MeanFixation)) == 1, NA_real_, t.test(MeanFixation, mu = 0.5)$statistic[[1]])
)

Despite these checks, the issue with NA values persists. I suspect this might be due to the lack of variability in my data, which could be problematic for either the resampling procedure or the statistical tests being applied within the boot_alttime function. However, I'm having trouble pinpointing the exact source of the problem.

Could someone please review my code and offer insights into why the bootstrap function is returning NAs? Any suggestions on how to troubleshoot and resolve this problem would be greatly appreciated. Thank you in advance!

Original Q&A

Error in bootstrap function in R data is constant, returning NAs

There are 0 best solutions below

Related Questions in R

Related Questions in STATISTICS-BOOTSTRAP

Trending Questions

Popular # Hahtags

Popular Questions