Find nearest line to point by group and Date in R?

89 Views Asked by At

I am working with line data and point data that differs by day. The ultimate goal is to find the nearest line segment to each point. However, different lines and points are relevant to different days. The points are similar to GPS pings and the lines segments that they can choose from vary by day. For example, for 1/1/2001 we might have one point and only line segments 1 and 3 (out of 5) available for them to pick from, and I want to know which of the two lines (rather than all 5) they're close to. In addition, the points and lines vary by person. I've been able to successfully find the closest line for an example individual, but I'm struggling with how to do this with a dataset that has thousands of individual specific points and line segments. Here is an example data frame/psuedo data (I have two data frames one for the lines and one for points):


ID <-c("1","1","1","2","2")
Date <- c("1/1/2001",
              "1/1/2001",
              "1/2/2001",
              "1/1/2001",
              "1/2/2001")

              
lat <- c(34,36,41,50,20)
long <- c(55,50,‑89,-175,-155)
points <- data.frame(ID, Date, lat, long)

ID <-c("1","1","1","1","2","2","2","2")
Date <- c("1/1/2001",
          "1/1/2001",
          "1/2/2001",
          "1/2/2001",
          "1/1/2001",
          "1/1/2001",
          "1/2/2001",
          "1/2/2001")


BegLat <- c(60,55,43,75,60,55,44,88)
BegLon <- c(55,50,‑89,-135,-100,-155,-130,-80)
EndLat <- c(36,75,55,80,65,60,42,90)
EndLon <- c(75,60,‑89,-75,-123,-140,-120,-77)


lines <- data.frame(ID, Date, BegLat, BegLon, EndLat, EndLon)

The lines data frame, as you can see, doesn't actually have linestring or other such object, so this is what I did to create a linestring by group (and to prepare points for matching):

lines <- lines %>%
  dplyr::mutate(
    lineid = row_number(),
    Date = Date,
    ID = ID) %>%
  unite(start, BegLon, BegLat) %>% # collect coords into one column for reshaping
  unite(end, EndLon, EndLat) %>%
  gather(start_end, coords, start, end) %>% # reshape to long
  separate(coords, c("LON", "LAT"), sep = "_") %>% # convert our text coordinates back to individual numeric columns
  st_as_sf(coords = c("LON", "LAT")) %>%
  dplyr::group_by(lineid, Date, ID) %>%
  dplyr::summarise() %>% # union points into lines using our created lineid
  # create points
  st_cast("LINESTRING")

points<-  st_as_sf(points, coords = c("long", "lat"), crs = 4326)

Where I get stuck is accounting for individual ID when I get to the point of finding the closest point to string. Here is the code that gets me the closest string without accounting for Date or ID:

lines.sp <- as_Spatial(st_geometry(lines), IDs = as.character(lines$lineid))
points.sp <- as_Spatial(st_geometry(points))

dist <- as.data.frame(geosphere::dist2Line(p = points.sp, line = lines.sp))

The above code successfully finds the closest line for the points, but across all IDs and all line segments.

The one thing I've tried, but isn't quite working, is a for loop. The problem with a for loop is I would need to compare each point with each possible string for each day, and that could take a while. Is there a way to maybe use dplyr or some other package to do this is a more efficient way?

0

There are 0 best solutions below