How to find and describe consecutive NA-values in a time series?

520 Views Asked by LGe At 25 August 2021 at 09:12

in order to cluster time series by certain criteria I want to measure the length of gaps inside of the series. These gaps may indicate various business problems and an experienced analyst should have a look at them, before the values are pipelined into the predictive model. I know how to identify, count and replace NA's. That's not what I want to do here. Rather I want to identify the length of each consecutive (the single ones too) NA value as a vector per time series.

Some code to illustrate my problem and wanted outcome:

suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(lubridate))

dates <- ymd("2016-01-01")+ months(0:71)
fake_values <- 
    c(661,678,1094,1987,3310,2105,1452,983,1107,805,675,684,436,514,668,206,19,NA,NA,NA,1174,1760,735,366,
    510,580,939,1127,2397,1514,1370,832,NA,661,497,328,566,631,983,1876,2784,2928,2543,1508,1175,8,1733,
    862,779,1112,1446,2407,3917,2681,NA,NA,NA,NA,NA,1239,1581,2814,1419,2792,4899,5857,2450,1944,1464,1248,
    1533,NA)

df <- bind_cols(fake_values, dates) %>% 
                rename(c(values = ...1, dates = ...2)
                          )

# resulting vector('s)
result <- as.data.frame(c(3,1,5,1)) 
names(result)[1] = "some_products_gaps"

By the way, I would prefer a tidy solution. Hope I am not hoping too much... The solution must be scalable, as I need to loop/apply it for thousands of time series. Thanks in advance!

Original Q&A

There are 2 best solutions below

Ronak Shah On 25 August 2021 at 10:08 BEST ANSWER

With the help of data.table rleid here's a dplyr solution.

library(dplyr)

df %>%
  mutate(grp = data.table::rleid(is.na(values))) %>%
  filter(is.na(values)) %>%
  count(grp, name = 'some_products_gaps') %>%
  select(-grp)

#  some_products_gaps
#               <int>
#1                  3
#2                  1
#3                  5
#4                  1

Triss On 25 August 2021 at 09:58

This should do it

x = is.na(df$values)
rle(x)$lengths[rle(x)$values]

How to find and describe consecutive NA-values in a time series?

There are 2 best solutions below

Related Questions in R

Related Questions in TIME-SERIES

Related Questions in TIDYVERSE

Related Questions in DATA-MINING

Related Questions in PATTERN-RECOGNITION

Trending Questions

Popular # Hahtags

Popular Questions