calculate distance and time between points along animal movement path

1k Views Asked by At

I have a large dataset (> 9 million rows) of times and locations when individual animals were detected at stations. I would like to calculate the distance between each station along each animal's path as it travelled between stations, as well as the time it took to travel between stations. And then I would like to summarize the total distance and time across all sections of the path.

For each individual in this dataset, the data is organized with each time it was detected at a stationary points. If the individual was at the stationary point for a long, consecutive period of time, then there are multiple records (each ~30 s apart) for this time period.

I can summarize the data below to get 1 row for each time an individual was at a station (see below). However, the output doesn't recognize when an individual travels to the same station more than once.

E.g.

id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B")
site <- c("a", "a", "b", "a", "c", "c", "c", "d", "a", "b")
time <- seq(1:10)
lat <- c(1, 1, 2, 1, 3, 3, 3, 4, 1, 2)
lon <- c(1, 1, 2, 1, 3, 3, 3, 4, 1, 2)

df <- data.frame(id, site, time, lat, lon)

df %>% group_by(id, site, lat, lon) %>%
  summarize(timeStart = min(time), 
            timeEnd = max(time))

# A tibble: 6 x 6
# Groups:   id, site, lat [?]
  id    site    lat   lon timeStart timeEnd
  <fct> <fct> <dbl> <dbl>     <dbl>   <dbl>
1 A     a         1     1         1       4
2 A     b         2     2         3       3
3 A     c         3     3         5       7
4 A     d         4     4         8       8
5 B     a         1     1         9       9
6 B     b         2     2        10      10

I an approach to group the data so that the multiple visits to the same station (with trips to other stations in between) are recognized as a separate "leg" of the trip.

Then, I need to calculate the great circle distance between each station, as well as the time difference in time between timeEnd (1st station) and timeStart (2nd station).

2

There are 2 best solutions below

1
Dave2e On BEST ANSWER

This may not be your complete solution but it is a good start. This will find the distance and time difference between each row of data and sets the values to NA when the id changes between rows.

df <- data.frame(id, site, time, lat, lon)

library(geosphere)
library(dplyr)

#sort data by id and time
df<-df[order(df$id, df$time), ]
#find distance between each point in column
# Note longitude is the first column
df$distance<-c(NA, distGeo(df[,c("lon", "lat")]))
#find delta time between each row for each id
df<-df %>% group_by(id) %>% mutate(dtime=case_when(site != lag(site) ~ time-lag(time),
                                               TRUE ~ NA_integer_))
#remove distances where there was no delta time (row pairs with different ids)
df$distance[is.na(df$dtime)]<-NA

#id summary
df%>% summarize(disttraveled=sum(distance, na.rm=TRUE), totaltime=sum(dtime, na.rm=TRUE))
4
Henrik On

First, the data.table function rleid is used to create a grouping variable: for each individual, each change of site represents a new group. Within each group, calculate the desired stats:

library(data.table)
library(geosphere)
setDT(df)
df2 <- df[ , .(id = id[1],
               site = site[1],
               lat = lat[1],
               lon = lon[1],
               first_time = min(time),
               last_time = max(time)),
           by = .(id_site = rleid(id, site))]

Then, for each individual, sequential great-circle-distance between consecutive sites is calculated with geosphere::distHaversine. To avoid problems when individuals only have one or two records*, some checks are added:

df2[ , dist := if(.N == 1){
  0 } else if(.N == 2){
    c(0, distHaversine(c(lon[1], lat[1]), c(lon[2], lat[2])))
  } else c(0, distHaversine(as.matrix(.SD[ , .(lon, lat)]))), by = id]

#    id_site id site lat lon first_time last_time     dist
# 1:       1  A    a   1   1          1         2      0.0
# 2:       2  A    b   2   2          3         3 157401.6
# 3:       3  A    a   1   1          4         4 157401.6
# 4:       4  A    c   3   3          5         7 314755.2
# 5:       5  A    d   4   4          8         8 157281.8
# 6:       6  B    a   1   1          9         9      0.0
# 7:       7  B    b   2   2         10        10 157401.6
# 8:       8  C    a   1   1         11        11      0.0

Thus, for each individual, distance is calculated only once per new site. This contrasts with the other answer where distance calculations are performed between each time step (possibly many, it seems).


*Try e.g. distHaversine(cbind(1, 1)) (distGeo(cbind(1, 1))), or distHaversine(cbind(c(1, 1), c(1, 2))) (distGeo(cbind(c(1, 1), c(1, 2))))


Data

I added an individual with only one record as test case.

id <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "C")
site <- c("a", "a", "b", "a", "c", "c", "c", "d", "a", "b", "a")
time <- seq(1:11)
lat <- c(1, 1, 2, 1, 3, 3, 3, 4, 1, 2, 1)
lon <- c(1, 1, 2, 1, 3, 3, 3, 4, 1, 2, 1)

df <- data.frame(id, site, time, lat, lon)