st_join is not precise enough for calculating whether a point is inside a polygon

62 Views Asked by At

I have one dataframe with the lat/long coords of different snakes, like this:

df1 <- data.frame(snake = c("snake_1", "snake_2"),
                  lat = c(40.68491, 40.8),
                  long = c(-83.34485, -83.3)) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326)

> df1
Simple feature collection with 2 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -83.34485 ymin: 40.68491 xmax: -83.3 ymax: 40.8
Geodetic CRS:  WGS 84
    snake                   geometry
1 snake_1 POINT (-83.34485 40.68491)
2 snake_2         POINT (-83.3 40.8)

I have a second dataframe of sf polygons, like this:

df2 <- data.frame(lat = c(40.68477,
                          40.68477,
                          40.9923,
                          40.9923,
                          40.68477),
                  long = c(-83.51883,
                           -83.11233,
                           -83.11233,
                           -83.51883,
                           -83.51883)) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% 
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON") %>% 
  mutate(region = "region_1")

> df2
Simple feature collection with 1 feature and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -83.51883 ymin: 40.68477 xmax: -83.11233 ymax: 40.9923
Geodetic CRS:  WGS 84
                        geometry   region
1 POLYGON ((-83.51883 40.6847... region_1

As you can see, both snake_1 and snake_2 should be within the bounds of the polygon, though snake_1 is close to the minimum latitude. You can see the two points in a ggplot:

ggplot() +
  geom_sf(data = df2, fill = "lightblue") +
  geom_sf(data = df1, color = "red", size = 3) +
  theme_minimal()

ggplot

My goal is to see whether each point is inside the polygon. When I do so with the following code, the resulting dataframe indicates that snake_1 is not inside the bounding box. How can this be? That point is definitely inside of the bounds of the polygon. Is there a different function I should be using besides st_join?

test <- st_join(df1, df2, join = st_within)
> test
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -83.34485 ymin: 40.68491 xmax: -83.3 ymax: 40.8
Geodetic CRS:  WGS 84
    snake   region                   geometry
1 snake_1     <NA> POINT (-83.34485 40.68491)
2 snake_2 region_1         POINT (-83.3 40.8)
2

There are 2 best solutions below

1
margusl On BEST ANSWER

It can get bit tricky when dealing with polygons that are defined with just 4 corners while using an unprojected coordinate system. For example, one might assume that latitude values along the bottom edge of that df2 polygon are constant (i.e. bottom section is parallel to parallels), though it's not always the case:

# a linestring defined by geographic coordinates, CRS WGS 84
# latitude values for both ends are the same
line_1 <- sf::st_as_sfc("LINESTRING (-83.51883 40.68477, -83.11233 40.68477)", crs = "WGS84")

# but when adding more vertices to the linestring, our latitudes form a curve
# that "bends" behind snake_1 latitude
coords <- sf::st_coordinates(sf::st_segmentize(line_1, 2000))
coords[,"Y"]
#>  [1] 40.68477 40.68479 40.68481 40.68483 40.68485 40.68486 40.68488 40.68489
#>  [9] 40.68490 40.68491 40.68492 40.68493 40.68494 40.68494 40.68495 40.68495
#> [17] 40.68495 40.68495 40.68495 40.68494 40.68494 40.68493 40.68492 40.68491
#> [25] 40.68490 40.68489 40.68488 40.68486 40.68485 40.68483 40.68481 40.68479
#> [33] 40.68477
plot(coords)
abline(h = 40.68491, col = "red")
text(-83.53,40.68491, "snake_1 latitude", adj = c(0, -1))

One option is to add more vertices to polygons, though care should be taken when and how to do that. Here we'll just drop CRS for a moment so lat/lon values would be treated as planar coordinates, segmentize, and then restore original CRS.

library(sf)
#> Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE
library(dplyr)
library(ggplot2)

df1 <- data.frame(snake = c("snake_1", "snake_2"),
                  lat = c(40.68491, 40.8),
                  long = c(-83.34485, -83.3)) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326)

df2 <- data.frame(lat = c(40.68477,40.68477,40.9923,40.9923,40.68477),
                  long = c(-83.51883,-83.11233,-83.11233,-83.51883,-83.51883)) %>% 
  st_as_sf(coords = c("long", "lat"), crs = 4326) %>% 
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON") %>% 
  mutate(region = "region_1")

# segmentize region polygon, add point per every .1 deg
df2_seg <- st_set_crs(df2, NA) %>% 
  st_segmentize(0.1) %>% 
  st_set_crs(4326)

# test with updates polygons
st_join(df1, df2_seg, join = st_within)
#> Simple feature collection with 2 features and 2 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -83.34485 ymin: 40.68491 xmax: -83.3 ymax: 40.8
#> Geodetic CRS:  WGS 84
#>     snake   region                   geometry
#> 1 snake_1 region_1 POINT (-83.34485 40.68491)
#> 2 snake_2 region_1         POINT (-83.3 40.8)

Here's another attempt to visualize this. Let's scale up our box, move it up north and use conical projection. It's not the best visual, but we can still see how 4-corner polygon differs from the one with more vertices.

bb1 <- 
  st_bbox(c(xmin = -85, xmax = -82, ymax = 66, ymin = 65.8), crs = "WGS84") %>% 
  st_as_sfc()
# 4 corners
st_as_text(bb1)
#> [1] "POLYGON ((-85 65.8, -82 65.8, -82 66, -85 66, -85 65.8))"

bb2 <- 
  st_bbox(c(xmin = -85, xmax = -82, ymax = 66, ymin = 65.8)) %>% 
  st_as_sfc() %>% 
  st_segmentize(0.1) %>% 
  st_set_crs("WGS84")

# point per every .1 deg
st_as_text(bb2)
#> [1] "POLYGON ((-85 65.8, -84.9 65.8, -84.8 65.8, -84.7 65.8, -84.6 65.8, -84.5 65.8, -84.4 65.8, -84.3 65.8, -84.2 65.8, -84.1 65.8, -84 65.8, -83.9 65.8, -83.8 65.8, -83.7 65.8, -83.6 65.8, -83.5 65.8, -83.4 65.8, -83.3 65.8, -83.2 65.8, -83.1 65.8, -83 65.8, -82.9 65.8, -82.8 65.8, -82.7 65.8, -82.6 65.8, -82.5 65.8, -82.4 65.8, -82.3 65.8, -82.2 65.8, -82.1 65.8, -82 65.8, -82 65.9, -82 66, -82.1 66, -82.2 66, -82.3 66, -82.4 66, -82.5 66, -82.6 66, -82.7 66, -82.8 66, -82.9 66, -83 66, -83.1 66, -83.2 66, -83.3 66, -83.4 66, -83.5 66, -83.6 66, -83.7 66, -83.8 66, -83.9 66, -84 66, -84.1 66, -84.2 66, -84.3 66, -84.4 66, -84.5 66, -84.6 66, -84.7 66, -84.8 66, -84.9 66, -85 66, -85 65.9, -85 65.8))"

ggplot() +
  geom_sf(data = bb2, aes(fill = "seg")) +
  geom_sf(data = bb1, aes(fill = "4-point"), alpha = .8) +
  scale_fill_brewer(type = "qual") +
  coord_sf(crs = "+proj=lcca +lon_0=-83 +lat_0=70") +
  theme_minimal()

Created on 2024-03-16 with reprex v2.1.0

1
cristian-vargas On

Your code works as-is if you set sf_use_s2(FALSE) first. By default, the latest release of sf uses spherical geometry (rather than planar geometry) for geographical coordinate operations.

sf_use_s2(FALSE)

test <- sf::st_join(df1, df2, join = st_within)

The output looks like:

Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -83.34485 ymin: 40.68491 xmax: -83.3 ymax: 40.8
Geodetic CRS:  WGS 84
    snake   region                   geometry
1 snake_1 region_1 POINT (-83.34485 40.68491)
2 snake_2 region_1         POINT (-83.3 40.8)