I have a dataframe which includes the column "country" with various country names.
I want to find out which countries (say, UN member states) are missing.
Is there any quick way to do that in an automated way, perhaps with the package countrycode?
Here is my dput:
structure(list(country = c("Albania", "Algeria", "Angola", "Antigua and Barbuda",
"Argentina", "Armenia", "Australia", "Austria", "Azerbaijan",
"Bahamas", "Bahrain", "Bangladesh", "Barbados", "Belarus", "Belgium",
"Bhutan", "Bolivia", "Bosnia and Herzegovina", "Botswana", "Brazil",
"Brunei", "Bulgaria", "Burkina Faso", "Cambodia", "Canada", "Chile",
"Colombia", "Costa Rica", "Cote d'Ivoire", "Croatia", "Cuba",
"Czechia", "Democratic Republic of the Congo", "Denmark", "Djibouti",
"Dominica", "Dominican Republic", "Ecuador", "Egypt", "El Salvador",
"Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France",
"Gabon", "Georgia", "Germany", "Ghana", "Greece", "Guatemala",
"Guinea", "Guyana", "Honduras", "Hungary", "Iceland", "India",
"Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica",
"Japan", "Jordan", "Kazakhstan", "Kenya", "Kuwait", "Kyrgyzstan",
"Laos", "Latvia", "Lebanon", "Lesotho", "Liechtenstein", "Lithuania",
"Luxembourg", "Macedonia", "Madagascar", "Malawi", "Malaysia",
"Malta", "Mauritania", "Mauritius", "Mexico", "Micronesia", "Moldova",
"Monaco", "Mongolia", "Morocco", "Myanmar", "Namibia", "Nepal",
"Netherlands", "New Zealand", "Nicaragua", "Niger", "Nigeria",
"Norway", "Oman", "Pakistan", "Palau", "Panama", "Papua New Guinea",
"Paraguay", "People's Republic of China", "Peru", "Philippines",
"Poland", "Portugal", "Qatar", "Romania", "Russia", "Rwanda",
"Samoa", "San Marino", "Saudi Arabia", "Senegal", "Serbia", "Singapore",
"Slovakia", "Slovenia", "South Africa", "South Korea", "Spain",
"Sri Lanka", "Sudan", "Suriname", "Sweden", "Switzerland", "Syria",
"Taiwan", "Tajikistan", "Tanzania", "Thailand", "Tonga", "Trinidad and Tobago",
"Tunisia", "Turkey", "U.K.", "U.S.A.", "Uganda", "Ukraine", "United Arab Emirates",
"Uruguay", "Uzbekistan", "Venezuela", "Vietnam", "Yemen", "Zambia",
"Zimbabwe")), row.names = c(NA, -152L), class = c("tbl_df", "tbl",
"data.frame"))
You can certainly get a vector of "countries" stored in
countrycodesthat are missing from your own data:However, while this contains many extant countries that are missing from your data (such as Afghanistan, Belize, Benin, etc), some of them are semi-autonomous regions that are not countries in their own right (Jersey, Zanzibar, Gibraltar) or are historical and no longer exist as countries (e.g. Yugoslavia).
To filter out entries that are not current countries, I might use something like
rnaturalearth:This gives you a reasonable list of 28 current countries that are not in your original list. Of these, most are UN members, but not all are (to the best of my knowledge Greenland, Antarctica, Kosovo, New Caledonia, Somaliland and Puerto Rico do not have independent representation at the UN at the time of writing)
Created on 2023-09-28 with reprex v2.0.2