I have a dataframe Team5 in R with a column country_code that contains abbreviations of countries. I am using
Team5$Country <- countrycode(Team5$country_code, "iso2c", "country.name")
from the countrycode package to translate the abbreviations into countries. Now, I want to calculate the geographic distance between the countries (possibly via their centroids) but I don't have any data on longitude oder latitude. How can I calculate a rough distance between the countries and place it into a new column Distance?
This is a sample of my dput() :
structure(list(ALL_ID = c(1240, 3640, 3087, 3877, 4317, 2671,
1398, 9433, 18089, 200, 3137, 7398, 21148, 22187, 167, 5814,
292, 1908, 1451, 22795), Country = c("Ireland", "Australia",
"Switzerland", "United Kingdom", "Angola", "Netherlands", "Spain",
"Spain", "Spain", "Austria", "Indonesia", "France", "Canada",
NA, "Germany", "Turkey", "South Africa", "Canada", "Australia",
"Russia")), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), groups = structure(list(ALL_ID = c(167,
200, 292, 1240, 1398, 1451, 1908, 2671, 3087, 3137, 3640, 3877,
4317, 5814, 7398, 9433, 18089, 21148, 22187, 22795), .rows = structure(list(
15L, 10L, 17L, 1L, 7L, 19L, 18L, 6L, 3L, 11L, 2L, 4L, 5L,
16L, 12L, 8L, 9L, 13L, 14L, 20L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -20L), .drop = TRUE))
You would need some external spatial data source that you could map / join to your existing dataset. There are quite a few of those providing country polygons along with country names/codes, usually distributed as a Shapefile, Geopackage or GeoJSON. Many such sources are also accessible directly through R packages,
giscoRbeing one of those options that allow easy access to Eurstat GISCO data (list of countries is not limited to EU), and it is also delivered with some lightweight offline datasets.From polygons you can get centroids and build a distance matrix of centroids, this is where
sfpackage comes into play:Resulting distance matrix:
Adding "a" distance column is bit tricky, if you think it through you'll notice that you have to widen your dataset (add columns), lengthen it (add rows), or use some other means to pack distances for all country pairs (e.g. store lists of distances in a single column).
The most straightforward method is probably just going with a wide format and joining the distance matrix to your original dataset:
For reference, polygons and centroids of AT & CH:
Input data:
Created on 2023-06-26 with reprex v2.0.2