Improve code speed performance - ‘by’ and ‘approx’ functions in R

98 Views Asked by At

I have a calculated oxygen profile (cop) data for date and depth. The depth interval for each date is not the same so I need to calculate the oxygen for a round depth column (“Depth2”) by linear approximation. I have done it with ‘by’ and an approx function. It works fine, but a bit slow (about 3.5 seconds for the data set in the code below). I am looking for a way to improve the speed of the calculation. Any suggestions? I hope the data loading from google drive will work smoothly.

 #download data from google drive
id<-"1p1wiw8NS-oMCI5RDZNaHc-55EpWSFEv8"# google file ID
cop<-read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download",id))
    
start_time <- Sys.time()

# calculate linear approx to the rounded depth and leave only unique result per date
df_list <- by(cop, cop$Date, function(sub) {
  copa <- approx(x=sub$Depth, y=sub$Oxygen, xout=sub$Depth2, rule=2)
  
  df <- unique(data.frame(Date = sub$Date, 
                          Depth2 = sub$Depth2,
                          O2 = copa$y,
                          stringsAsFactors = FALSE))
  return(df)
})    

cop2 <- do.call(rbind, unname(df_list))

end_time <- Sys.time()
end_time - start_time
1

There are 1 best solutions below

1
Rui Barradas On

The following version is 0% faster than the posted by based version. Not much, most of the time is in approx.

start_time2 <- Sys.time()
M <- as.matrix(cop[c("Depth", "Depth2", "Oxygen")])
inx <- split(seq_along(row.names(cop)), cop$Date)
df_list3 <- lapply(inx, function(i){
  copa <- approx(
    x = M[i, "Depth"],
    y = M[i, "Oxygen"],
    xout = M[i, "Depth2"],
    rule = 2
  )

  df <- unique(data.frame(copa))
  df$Date <- cop[ i[1], "Date"]
  df
})
cop3 <- do.call(rbind, unname(df_list3))[c(3, 1:2)]
names(cop3)[2:3] <- c("Depth2", "O2")
end_time2 <- Sys.time()

identical(cop2, cop3)  # FALSE
all.equal(cop2, cop3)  # TRUE

t1 <- as.numeric(end_time - start_time)
t2 <- as.numeric(end_time2 - start_time2)
100*(t1 - t2)/t1
#[1] 20.46678

rm(M, inx)  # final clean up