I have a calculated oxygen profile (cop) data for date and depth. The depth interval for each date is not the same so I need to calculate the oxygen for a round depth column (“Depth2”) by linear approximation. I have done it with ‘by’ and an approx function. It works fine, but a bit slow (about 3.5 seconds for the data set in the code below). I am looking for a way to improve the speed of the calculation. Any suggestions? I hope the data loading from google drive will work smoothly.
#download data from google drive
id<-"1p1wiw8NS-oMCI5RDZNaHc-55EpWSFEv8"# google file ID
cop<-read.csv(sprintf("https://docs.google.com/uc?id=%s&export=download",id))
start_time <- Sys.time()
# calculate linear approx to the rounded depth and leave only unique result per date
df_list <- by(cop, cop$Date, function(sub) {
copa <- approx(x=sub$Depth, y=sub$Oxygen, xout=sub$Depth2, rule=2)
df <- unique(data.frame(Date = sub$Date,
Depth2 = sub$Depth2,
O2 = copa$y,
stringsAsFactors = FALSE))
return(df)
})
cop2 <- do.call(rbind, unname(df_list))
end_time <- Sys.time()
end_time - start_time
The following version is 0% faster than the posted
bybased version. Not much, most of the time is inapprox.