Say I have a dataframe
stim1 stim2 choice outcome Feedback
1 2 1 0 0 1
2 3 2 1 1 1
3 2 3 1 0 1
4 2 3 0 1 1
My objective is to update at each row for stim1 and stim2, the cumulative mean outcome from previous times that stimulus was chosen.
choice=0 -> stim1 was chosen.
choice=1 -> stim2 was chosen.
As an algorithm:
a) For stim=2, find all previous trials where (stim1=2 & choice=0) | (stim2=2 & choce=1)
b) calculate the mean outcome over all such choices
For example, at trial 4 the observed outcomes for stim1 (i.e. for 2) is
In trial 1 it was chosen (choice=0) and outcome=0
In trial 2 it was chosen (choice=1) and outcome=1
In trial 3,it was not chosen (choice=1) so its not included in the count
So the observed outcomes is 1/2
Desired outcome
stim1 stim2 choice outcome Feedback Observed_Stim1 Observed_Stim2
1 2 1 0 0 1 NaN NaN
2 3 2 1 1 1 NaN 0
3 2 3 1 0 1 1/2 NaN
4 2 3 1 1 1 1/2 0
The inefficient loop version of what I am trying to do is
data$trial=1:NROW(data)
data$relative_stim1=rep(NaN, nrow(data))
data$relative_stim2=rep(NaN, nrow(data))
for (i in 2:nrow(data)){
data$relative_stim1[i]=mean(data$outcome[which((data$stim1==data$stim1[i]&data$choice==0&data$feedback==1& data$trial<data$trial[i]) | (data$stim2==data$stim1[i]&data$choice==1&data$feedback==1& data$trial<data$trial[i]))])
data$relative_stim2[i]=mean(data$outcome[which((data$stim1==data$stim2[i]&data$choice==0&data$feedback==1& data$trial<data$trial[i]) | (data$stim2==data$stim2[i]&data$choice==1&data$feedback==1& data$trial<data$trial[i]))])
}
The dplyr package includes several functions for cumulative operations like this. In your case, you will want to combine those with
group_by()to group by stimulus.Created on 2021-08-18 by the reprex package (v2.0.0)