Import shapefile and join to coordinates to create spatial dataframe in R

91 Views Asked by At

Before maptools and rgdal were deprecated, I used this code to import a shapefile, centroids csv and commuting count data, to create a flow map using the code below (I removed the ggplot code for legibility). I am trying to recreate this flow map with recent commuting data to compare changes between both time periods.

I read that read_sf/st_read was used to replace the readOGR function, so I used st_read to import the shapefile.

To join the shapefile to the centroid points, I need to get the polygon ID which is the common field for this join. However, the line soton@data$id = rownames(soton@data) which I'm using to get the polygon IDs that are stored as row names is throwing this error:

Error in soton@data : no applicable method for @ applied to an object of class "sf"

This code worked fine with maptools and rgdal, and I am using the exact same shapefile from before. I'll really appreciate help on tweaking my code to work with the new packages. Thank you.

library(tidyverse)  
library(plyr)  
library(ggplot2)  
library(maptools) -- replaced with library(sf)  
library(rgdal)    -- removed this and loaded sp package  
setwd("your working directory")
soton=st_read(dsn="SouthamptonMSOA",layer="SouthamptonOA")
ODmat=read.csv("ODSouthampton.csv")
centroids=read.csv("CentroidSouthampton.csv")
popdat=read.csv("SouthamptonWZ.csv")
ODmat$frx=0
ODmat$fry=0
ODmat$tox=0
ODmat$toy=0
for (i in 1:dim(ODmat)[1]){
ODmat$frx[i]=centroids$East[which(centroids$Code==ODmat$Origin[i])]
ODmat$fry[i]=centroids$North[which(centroids$Code==ODmat$Origin[i])]
ODmat$tox[i]=centroids$East[which(centroids$Code==ODmat$Destination[i])]
ODmat$toy[i]=centroids$North[which(centroids$Code==ODmat$Destination[i])]}
soton@data$id = rownames(soton@data)
soton.points = fortify(soton, region="id")
soton.df = join(soton.points, soton@data, by="id")
ODmatsub=subset(ODmat,Commuters>50)
figplot = ggplot()+
  # ... 

UPDATE: In the comments, it was suggested that I convert the centroids data to an sf object before joining to the shapefile. I have done this but now stuck on how to join the shapefile to the centroids data, bearing in mind that I also have to join the origin-destination data (ODSouthampton) to the centroids to create the flow lines.

Here's a sample of the centroids data(csv):

Code Name East North
E02003549 Southampton 001 442184.28 116144.73
E02003550 Southampton 002 439810.63 116092.19
E02003551 Southampton 003 443938.17 115842.76
E02003552 Southampton 004 439018.21 115633.94
E02003553 Southampton 005 443248.97 115497.68

Sample of the ODSouthampton data (csv):

Origin Destination Commuters
E02003549 E02003549 22
E02003549 E02003550 25
E02003549 E02003551 13
E02003549 E02003552 12
E02003549 E02003553 24

I also have workplace population data which I will display as points on the map, and it's in a csv format. Will I need to convert all 3 csv files to sf objects? Sample of the population data is below:

MSOA_Code WZ_Population POINT_X POINT_Y
E02003549 2149 442184.28 116144.73
E02003550 3227 439810.63 116092.19
E02003551 1998 443938.17 115842.76
E02003552 1596 439018.21 115633.94
E02003553 1772 443248.97 115497.68
3

There are 3 best solutions below

1
Grzegorz Sapijaszko On

Let me explain all steps:

Data preparation, as we don't have access to your SouthamptonMSOA neither csv files:

# b <- osmdata::getbb("Southampton")
# boundaries <- osmdata::opq(b, timeout = 25*60) |>
#   osmdata::add_osm_feature(key = "boundary", value = "administrative") |>
#   osmdata::osmdata_sf()
# 
# boundaries$osm_multipolygons |>
#   sf::st_write(dsn = "data/south.gpkg")
# 
# boundaries$osm_multipolygons |>
#   sf::st_centroid() |>
#   sf::st_coordinates() |>
#   write.csv(file = "data/centroids.csv")

You can skip above part because you have data on hand.

Let's read in the layer:

sh <- sf::st_read(dsn = "data/south.gpkg")
#> Reading layer `south' from data source `data/south.gpkg' using driver `GPKG'
#> Simple feature collection with 19 features and 129 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84

Just to show the content:

sh |>
  subset(select = c("name", "admin_level"))
#> Simple feature collection with 19 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>              name admin_level                           geom
#> 1     Test Valley           8 MULTIPOLYGON (((-1.308833 5...
#> 2       Eastleigh           8 MULTIPOLYGON (((-1.312912 5...
#> 3     Southampton           6 MULTIPOLYGON (((-1.354701 5...
#> 4      New Forest           8 MULTIPOLYGON (((-1.347534 5...
#> 5       Hampshire           6 MULTIPOLYGON (((-0.9320425 ...
#> 6       Bursledon          10 MULTIPOLYGON (((-1.309554 5...
#> 7  Hamble-le-Rice          10 MULTIPOLYGON (((-1.349241 5...
#> 8           Hound          10 MULTIPOLYGON (((-1.349241 5...
#> 9       Hedge End          10 MULTIPOLYGON (((-1.320232 5...
#> 10       West End          10 MULTIPOLYGON (((-1.354701 5...

Let's read in csv file and show it's content:

c <- read.csv("data/centroids.csv") 
c
#>    X.1         X        Y
#> 1    1 -1.510652 51.12843
#> 2    2 -1.327202 50.93302
#> 3    3 -1.398709 50.91700
#> 4    4 -1.624719 50.85839
#> 5    5 -1.287777 51.06381
#> 6    6 -1.315738 50.88791
#> 7    7 -1.323930 50.86144
#> 8    8 -1.347316 50.87588
#> 9    9 -1.303966 50.91370
#> 10  10 -1.329964 50.93579
#> 11  11 -1.325985 50.97084
#> 12  12 -1.413366 50.96088
#> 13  13 -1.468417 50.95084
#> 14  14 -1.497882 50.98122
#> 15  15 -1.412990 50.86973
#> 16  16 -1.448976 50.89033
#> 17  17 -1.494418 50.91691
#> 18  18 -1.495827 50.85305
#> 19  19 -1.359211 50.95967

Now, we convert csv (data.frame) to sf object:

c <- c|>
  sf::st_as_sf(coords = c("X", "Y"), crs = "EPSG:4326")

c
#> Simple feature collection with 19 features and 1 field
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -1.624719 ymin: 50.85305 xmax: -1.287777 ymax: 51.12843
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>    X.1                   geometry
#> 1    1 POINT (-1.510652 51.12843)
#> 2    2 POINT (-1.327202 50.93302)
#> 3    3   POINT (-1.398709 50.917)
#> 4    4 POINT (-1.624719 50.85839)
#> 5    5 POINT (-1.287777 51.06381)
#> 6    6 POINT (-1.315738 50.88791)
#> 7    7  POINT (-1.32393 50.86144)
#> 8    8 POINT (-1.347316 50.87588)
#> 9    9  POINT (-1.303966 50.9137)
#> 10  10 POINT (-1.329964 50.93579)

Let's use only 2 polygons from our data set, with admin_level == 6

sh |>
  subset(admin_level == "6", select = c("name", "geom")) 
#> Simple feature collection with 2 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#>          name                           geom
#> 3 Southampton MULTIPOLYGON (((-1.354701 5...
#> 5   Hampshire MULTIPOLYGON (((-0.9320425 ...

And finally we join them together:

sh |>
  subset(admin_level == "6", select = c("name", "geom")) |>
  sf::st_join(c)
#> Simple feature collection with 19 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#> First 10 features:
#>            name X.1                           geom
#> 3   Southampton   3 MULTIPOLYGON (((-1.354701 5...
#> 5     Hampshire   1 MULTIPOLYGON (((-0.9320425 ...
#> 5.1   Hampshire   2 MULTIPOLYGON (((-0.9320425 ...
#> 5.2   Hampshire   4 MULTIPOLYGON (((-0.9320425 ...
#> 5.3   Hampshire   5 MULTIPOLYGON (((-0.9320425 ...
#> 5.4   Hampshire   6 MULTIPOLYGON (((-0.9320425 ...
#> 5.5   Hampshire   7 MULTIPOLYGON (((-0.9320425 ...
#> 5.6   Hampshire   8 MULTIPOLYGON (((-0.9320425 ...
#> 5.7   Hampshire   9 MULTIPOLYGON (((-0.9320425 ...
#> 5.8   Hampshire  10 MULTIPOLYGON (((-0.9320425 ...

Or opposite way, joining boundaries to points:

sh <- sh |>
  subset(admin_level == "6", select = c("name", "geom")) 
sh
#> Simple feature collection with 2 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: -1.957281 ymin: 50.70574 xmax: -0.7293437 ymax: 51.38392
#> Geodetic CRS:  WGS 84
#>          name                           geom
#> 3 Southampton MULTIPOLYGON (((-1.354701 5...
#> 5   Hampshire MULTIPOLYGON (((-0.9320425 ...

csh <- c |>
  sf::st_join(sh)

tmap::tm_shape(sh) +
  tmap::tm_polygons() +
  tmap::tm_shape(csh) +
  tmap::tm_dots("name")

Created on 2024-03-05 with reprex v2.1.0

0
Grzegorz Sapijaszko On

Based on your data. I'm leaving previous answer as a general one how to join spatial data with csv.

Please note, in below example boundaries are taken from {geographr} package.

centroids <- read.csv("~/projekty/test/data/centroids.csv")

boundaries <- geographr::boundaries_msoa11 |>
  subset(msoa11_code %in% centroids$Code)

population <- read.csv("~/projekty/test/data/population.csv") |>
  subset(select = c("MSOA_Code", "WZ_Population"))

From population data set I have taken subset of MSOA code and population only, centroids are not necessary, as we will add population info to our boundaries/polygons.

boundaries <- boundaries |>
  dplyr::left_join(population, by = c("msoa11_code" = "MSOA_Code")) |>
  sf::st_transform(crs = "EPSG:27700")

Now, we will read in ODSoupthampton data and create linestrings by joining with centroids for origin and destination, converting it to spatial data frame and assigning EPSG:27700 CRS.

od <- read.csv("~/projekty/test/data/ODSouthampton.csv", strip.white = TRUE) |>
  dplyr::left_join(centroids, by = c("Origin" = "Code")) |>
  dplyr::rename(c("o_East" = "East", "o_North"= "North")) |>
  dplyr::left_join(centroids, by = c("Destination" = "Code")) |>
  dplyr::rename(c("d_East" = "East", "d_North"= "North")) |>
  dplyr::mutate(geom = paste0("LINESTRING(", o_East, " ", o_North, ", ", d_East, " ", d_North, ")")) |>
  sf::st_as_sf(wkt = "geom", crs = "EPSG:27700") |>
  subset(select = c(1:3, 10))

And final plot. Please note that 22 commuters are from E02003549 to E02003549 which obviously doesn't make sense to plot, but..

tmap::tm_shape(boundaries) +
  tmap::tm_polygons("WZ_Population") +
  tmap::tm_shape(od) +
  tmap::tm_lines(lwd = "Commuters",
                 lwd.scale = tmap::tm_scale_categorical())

Created on 2024-03-05 with reprex v2.1.0

0
veesilouette On

I resolved this with the od package for manipulating and mapping origin-destination data. I have upvoted the comment by @Grzegorz which was useful in converting the shapefiles and joining the centroids to it, in my use case.