The title really sums up my question. Apply seems to drop the CRS, but other functions don't. What is the best way to calculate a geographic function on a vector of points?
library(tidyverse)
library(sf)
# Generate 1000 lat longs, save as sf, and set crs
df1 <- data.frame(lat = runif(1000, 30, 33.4), long = runif(1000, -95, -82)) %>%
st_as_sf(coords = c("long", "lat"),
crs = 4326)
# Single point, with identical crs
df2 <- data.frame(lat = 32, long = -96) %>%
st_as_sf(coords = c("long", "lat"),
crs = 4326)
apply(df1, 1, function(x) st_distance(x, df2))
This gives the error: Error in st_distance(x, df2) : st_crs(x) == st_crs(y) is not TRUE
But these both work fine:
st_distance(df1[1,], df2)
final.df <- NULL
for(i in 1:nrow(df1)){
ith.distance <- st_distance(df1[i,], df2)
final.df <- rbind(final.df, ith.distance)
}
The for loop is certainly not the most efficient way to do this. ...Is it??
Points:
forandapplyloops have no real difference in performance, that's an old condition (it used to be true) that was fixed many years ago. For the most part, the decision to use one over the other should be made on data-wrangling and proficiency reasons, not "speed". Typically when there are problems withfor-loop performance, it has everything to do with what is done inside the loop, and nothing to do withforitself.Iteratively calling
final.df <- rbind(final.df, ith.distance)is going to perform very poorly in the long haul. With each iteration, it makes a complete copy of all data; this is fine when you have a few rows, but as you get into the 100s and 1000s, it takes just a little bit more time each pass through the loop. Don't do this, it's better to append the results to alistand do a single call torbind. I'll demonstrate this later.It is generally more efficient to loop over the smaller of the frames; in this case, that would be
df2instead ofdf1. (Perhaps this is just a matter of the sample data you generated, but I'm saying it in case.)I suggest iterating over row numbers and doing your distance calculation that way.
res1is a vector,res2is a matrix (1 column), and the values are identical. The latter ran much faster, this is the results of using R's vectorized calculations when available (and they are here).For the record, here is a
forloop but avoiding the per-passrbind.As to why
applydoesn't work: let's look at what is provided internally:If you look at the start of
apply(src/library/base/R/apply.R#29), one of the first things it does is convert the input to amatrix:Among other things, this strips the CRS from the
"sf"-class object.(For the record, this call to
as.matrix(X)is also a problem when doing row-wise operations on a frame with mixedcharacterand non-string columns; ifXis not sufficiently subsetted beforeapply, then all values may be converted to strings.)