I have a data table that captures the monthly returns for different stocks over a period of time. The data was collected in a way that if the sock was delisted, the rest of the return values are filled with zeroes (which does not align with reality). The filtering I need is the following:
for each stock: remove all lines from the end of the data collecting period which have a zero return until the first non-zero return. All other returns after the first non-zero return are valid and should not be deleted.
My idea was to order the data by date in descending order, group It by Id and then apply a function that removes all the zeroes until the first non-zero. However, I am not sure how to apply this. I wanted to use the data.table package so I created this function.
i <- 1
nzero <- i + 1
while (i <= count(stocks)+1) {
if (stocks[i,6] == 0 | is.na(stocks[i,6])) {
while (stocks[nzero,6] == 0 | is.na(stocks[nzero,6])) {
nzero <- nzero + 1
}
test89 <- rbind(test89, tocks[nzero, count(stocks)])
i <- nzero
}
else{
if (stocks[i,1] != stocks[(i+1),1]) {
test89 <- rbind(test89,stocks[(i+1):count(stocks)])
}
}
i <- i+1
}
the first column of stocks refers to the Id of the stock and the 6. refers to the return.
Does this make sense? how can I change it for it to work and how can I apply a personalized function after grouping the data?
Thanks for the support and sorry for the not so qualified question Best
Perhaps something like this is what you are looking for? I made some dummy data for 12 months where returns are either 1 or 0. First one has a run of 6 non-zero values, second has 4.