Data Filtering. Remove zero rows but only until the first non zero value for each ID

Question

Data Filtering. Remove zero rows but only until the first non zero value for each ID

156 Views Asked by MarcS At 21 April 2022 at 21:18

I have a data table that captures the monthly returns for different stocks over a period of time. The data was collected in a way that if the sock was delisted, the rest of the return values are filled with zeroes (which does not align with reality). The filtering I need is the following:

for each stock: remove all lines from the end of the data collecting period which have a zero return until the first non-zero return. All other returns after the first non-zero return are valid and should not be deleted.

My idea was to order the data by date in descending order, group It by Id and then apply a function that removes all the zeroes until the first non-zero. However, I am not sure how to apply this. I wanted to use the data.table package so I created this function.

i <- 1
nzero <- i + 1

while (i <= count(stocks)+1) {
  if (stocks[i,6] == 0 | is.na(stocks[i,6])) {
    while (stocks[nzero,6] == 0 | is.na(stocks[nzero,6])) {
      nzero <- nzero + 1
    }
    test89 <- rbind(test89, tocks[nzero, count(stocks)])
    i <- nzero
  }
  else{
    if (stocks[i,1] != stocks[(i+1),1]) {
      test89 <- rbind(test89,stocks[(i+1):count(stocks)])
    }
  }
  i <- i+1
}

the first column of stocks refers to the Id of the stock and the 6. refers to the return.

Does this make sense? how can I change it for it to work and how can I apply a personalized function after grouping the data?

Thanks for the support and sorry for the not so qualified question Best

Original Q&A

There are 1 best solutions below

**Andrew Brown** · Answer 1 · 2022-04-21T22:03:07.297000

Perhaps something like this is what you are looking for? I made some dummy data for 12 months where returns are either 1 or 0. First one has a run of 6 non-zero values, second has 4.

library(data.table)

stocks <- data.table(
  stock = do.call('c', lapply(letters[1:2], rep, 12)),
  month = rep(1:12, 2),
  return = c(rep(1, 6), rep(0, 6),
             rep(1, 4), rep(0, 8))
)

# reverse date order
stocks <- stocks[order(-month)]

# calculate (first) run of zero values
zero_run_length <- function(x) {
  r <- rle(x)
  if (r$values[1] == 0) {
    return(r$lengths[1])
  } else return(0)
}

# calculate "zero run length" (on reversed order table)
zero_idx <- stocks[, list(zrl = zero_run_length(return)), by = "stock"]
zero_idx
#>    stock zrl
#> 1:     b   8
#> 2:     a   6

# select target rows
stocks2 <- stocks[zero_idx, on = "stock"][, .SD[((zrl[1] + 1):.N),], by = "stock"]

# put back in order
stocks2[order(stock, month),]
#>     stock month return zrl
#>  1:     a     1      1   6
#>  2:     a     2      1   6
#>  3:     a     3      1   6
#>  4:     a     4      1   6
#>  5:     a     5      1   6
#>  6:     a     6      1   6
#>  7:     b     1      1   8
#>  8:     b     2      1   8
#>  9:     b     3      1   8
#> 10:     b     4      1   8

Data Filtering. Remove zero rows but only until the first non zero value for each ID

There are 1 best solutions below

Related Questions in R

Related Questions in DATA-FILTERING

Trending Questions

Popular # Hahtags

Popular Questions