I'm currently working on a bootstrap function in R using the 'boot' package for statistical analysis. My aim is to investigate whether there is a statistically significant difference in the divergence points between eye fixations (binomial data, 0, 1) on various target objects versus competitor objects presented on the screen over time. Here's the structure of my code:
# packages I'm using in my script
library(viridisLite)
library(boot)
library(polycor)
library(mgcv)
library(Rmisc)
library(ggplot2)
library(dplyr)
library(tidyr)
# bootstrap function
boot_alttime <- function(original_data, resample_indices) {
dat_resample <- original_data[resample_indices,]
# prepare the resample data for testing
dat_asso <- dat_resample %>% # associative effect
# keep only rows where either the target or competitor were fixated
filter(fixated == 1 & condition %in% c("Agent Consistent Target","Agent Consistent Distractor")) %>%
# create a new variable indicated whether or not the target was fixated
mutate(pTarget = ifelse(condition == "Agent Consistent Target", 1, 0)) %>%
# average fixation proportions by participant and time, keeping speaker group
dplyr::group_by(Participant, Time) %>%
dplyr::summarise(MeanFixation = mean(pTarget))
dat_con <- dat_resample %>% # strategic effect
filter(fixated == 1 & condition %in% c("Agent Consistent Target","Agent Inconsistent Target")) %>%
# create a new variable indicated whether or not the target was fixated
mutate(pTarget = ifelse(condition == "Agent Consistent Target", 1, 0)) %>%
# average fixation proportions by participant and time, keeping speaker group
dplyr::group_by(Participant, Time) %>%
dplyr::summarise(MeanFixation = mean(pTarget))
# apply at a statistical test at each timepoint for each group
# test for associative effect
test_asso <- dat_asso %>%
dplyr::group_by(Time) %>%
dplyr::summarise(t=t.test(MeanFixation, mu = .5)$statistic[[1]])
# test for strategic effect
test_con <- dat_con %>%
dplyr::group_by(Time) %>%
dplyr::summarise(t=t.test(MeanFixation, mu = .5)$statistic[[1]])
# return a TRUE/FALSE vector of significant positive t-scores (positive means more looks to the target than competitor)
t_asso <- test_asso$t > 1.96
t_con <- test_con$t > 1.96
# create empty vectors to store onsets
onset_asso <- onset_con <- c()
# find the index of the earliest run of 10 sequential TRUEs
for (i in 1:(length(t_asso)-10)) {
onset_asso[i] <- sum(t_asso[i:(i+9)]) == 10
onset_con[i] <- sum(t_con[i:(i+9)]) == 10
}
# find the difference between onsets
delta_assocon <- which(onset_con)[1] - which(onset_asso)[1]
# print
# note: the bootstrap returns the indices of the respective timepoints, not absolute times. The annotations to the right of each index (e.g. t[,1]) indicate where in the boot object the bootstrapped onset distributions can be found.
c(
delta_assocon,
which(onset_asso)[1],
which(onset_con)[1]
)
}
# calling the bootstrap function
CogLoad_bootres_alttime <- boot::boot(
# dataset to bootstrap
data = CogLoad_dat_gen_boot,
# bootstrap function
statistic = boot_alttime,
# stratification variable
strata = CogLoad_dat_gen_boot$StrataVars,
# number of iterations
R = Niter) # where Niter is 2000
Description of Data:
My data consists of observations from an experiment involving participants' eye fixations while listening to sentences and looking at objects on the screen. Here's a brief overview of the columns in the (.csv) dataset:
Participant: Participant ID
Participant.Gender: Gender of the participant
Time: Time in milliseconds
Trial: Trial number
Item: Item number
sentence: Sentence text
sentencetype: Type of sentence
speakergender: Gender of the speaker
verboffset: Verb offset
verbonset: Verb onset
Audio.Gender.Consistency: Consistent if the participant gender = speaker gender, otherwise Inconsistent
condition: Experimental condition
fixated: 0 if not fixated or 1 if fixated
StrataVars: Variables used for stratified bootstrap
Here's a preview of a random sample of 10 rows of my dataset:
| Participant | Participant.Gender | Time | Trial | Item | sentence | sentencetype | speakergender | verboffset | verbonset | Audio.Gender.Consistency | condition | fixated | StrataVars |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5M1 | Male | 300 | 46 | 9 | I used to dream of becoming a great plumber | Gendered | Male | 1265 | 905 | Consistent | Agent Consistent Target | 1 | 5M1 Agent Consistent Target 300 |
| 10F2 | Female | 1450 | 37 | 6 | Later, I am going to use the new urinal | Gendered | Male.png | 1700 | 1275 | Inconsistent | Agent Consistent Target | 0 | 10F2 Agent Consistent Target 1450 |
| 1F1P | Female | 100 | 16 | 7 | I really wanted to become a good princess | Gendered | Female | 1587 | 1112 | Consistent | Agent Inconsistent Target | 0 | 1F1P Agent Inconsistent Target 100 |
| 3F3 | Female | 800 | 19 | 21 | I have decided to buy a nice necklace | Gendered | Female | 1523 | 1206 | Consistent | Agent Inconsistent Distractor | 1 | 3F3 Agent Inconsistent Distractor 800 |
| 4F4 | Female | 550 | 21 | 27 | I have decided to wear the new perfume | Gendered | Female | 1729 | 1373 | Consistent | Agent Inconsistent Target | 0 | 4F4 Agent Inconsistent Target 550 |
| 11F3 | Female | 900 | 59 | 25 | I have decided to use the nice chainsaw | Gendered | Male | 1626 | 1135 | Inconsistent | Agent Consistent Target | 0 | 11F3 Agent Consistent Target 900 |
Here's what I've tried so far to debug the issue:
- Added data constant check: I integrated an if_else statement with 'summarise' in the bootstrap function (boot_alttime) to ensure that the data is not constant within each resampled group. This check was intended to address the possibility of constant data leading to NA values in the statistics
- Checked dplyr incompatibilities with other packages I'm using
- Checked variable types: I changed some variable types to see if that would change anything. Stratavars and Time need to be factors, fixated needs to be numeric for the t-tests
- Changed the number of iterations for bootstrapping
# Example of code snippet added for the constant check
summarise(
t = if_else(length(unique(MeanFixation)) == 1, NA_real_, t.test(MeanFixation, mu = 0.5)$statistic[[1]])
)
Despite these checks, the issue with NA values persists. I suspect this might be due to the lack of variability in my data, which could be problematic for either the resampling procedure or the statistical tests being applied within the boot_alttime function. However, I'm having trouble pinpointing the exact source of the problem.
Could someone please review my code and offer insights into why the bootstrap function is returning NAs? Any suggestions on how to troubleshoot and resolve this problem would be greatly appreciated. Thank you in advance!