Keep the identifier for each row in a rolling window of functions R

34 Views Asked by At

I have managed to apply a rolling window of functions per column of my accelerometer data where each row is associated with a behaviour. However, the output doesn't include the behaviour associated. Is there a way to either keep the behaviour per window in the dataset OR apply the behaviour associated to the rownames.


    df1 <- read.table(text = "
Time    X_accel  y_accel   Behaviour
1   0.01  0.1  Standing
2   0.01  0.2  Standing
3   0.01  0.2  Standing
4   0.02  0.1  Standing
5   0.06  0.8  Walking
6   0.07   0.8 Walking
7   0.01  0.2  Standing
8   0.02  0.2  Standing
9   0.01  0.1  Standing
10  0.9   0.95 Flying            
11  0.95  0.99 Flying
12  0.9   0.95 Flying",
                  header = TRUE)

# Sampling interval (Hz)
sampling.interval = 2
# Duration of epoch required (seconds)
duration = 1
# Window step required (0-1: percentage to move forward)
step = 0.1
# Set window size for rolling windows
window.size = sampling.interval*duration
# Set step size for rolling windows
window.step = step*window.size
####

time_domain_summary <- function(values) {
  
  features <- data.frame(
    mean = mean(values, na.rm = TRUE),
    median = quantile(values, probs = c(0.5), na.rm = TRUE), 
    mx =  max(values, na.rm = T),
    mn = min(values, na.rm = T), 
    sd = sd(values),
    range = max(values, na.rm = T) - min(values, na.rm = T)
    
  )
  return(features)
}  

f <- cumsum(c(0, df1$Behaviour[-1] != df1$Behaviour[-nrow(df1)]))

list <- by(df1[,c(2:3)], f, \(x, fill) {
  rollapply(x, FUN = time_domain_summary, width = window.size, by = window.step,  align = c("left"), partial = FALSE) 
  }, fill = NA)

df2 <- do.call(rbind, lapply(list, data.frame))

colnames(df2) <- c("x_mean", "x_median", "x_mx", "x_mn", "x_sd", "x_range", "y_mean", "y_median", "y_mx", "y_mn", "y_sd", "y_range")

Output = df2

I would like EITHER labelled in the rownames as: 0.1-0.3 "standing", 1 "Walking", 2.1 - 2.2 "Standing, 3.1-3.2 "Flying" and so on OR to have an extra column that includes the associated behaviour. This is a simple example - the actual dataset has 10000s of rows and behaviours which last varying amounts of time, anything less than the set window will be discounted in the output. I'm looking of a way to identify the output rows by the behaviour associated with them in df1 (regardless of size of dataset) if possible.

0

There are 0 best solutions below