I wrote a function using geopy to return the city from a set of coordinates containing latitude and longitude. However, the function only returned the city of about 10% of the entries. When I run the code on single entries it always returns the city so it’s nothing wrong with individual rows of data. Here is the function I wrote:
#importing libraries
from tkinter import *.
from geopy.geocoders import Nominatim.
from geopy.geocoders import Photon
#Create an instance of tinker frame
win = Tk()
#Define geometry of the window
win.geometry("700x350")
#creating a function
def get_city(coords):
#instantiate the Nominatim API
geolocator = Nominatim(user_agent="MyApp")
#get the city from the coordinates
location = geolocator.reverse(coords)
address = location.raw['address']
city = address.get('city', '')
#return the city
return city
#applying function to dataframe
irma['city'] = irma['coordinates'].apply(get_city)
I was expecting the function to return the city for every row, but it only returned city for about 10% of the rows.
first five entries of dataframe showing city being returned for one row
This is because OSM attribute data is highly incomplete. Just checking the first coordinates in your data frame, we see that there is an 'address' key in the raw dictionary but it doesn't have 'city' - while it has 'town' and even 'road'. Maybe, in your case you actually want 'town' here.
This is my simple code to get the results for the first coordinates:
I suggest if you only need cities - use a local shapefile from official sources (city boundaries - usually available for each Census dataset) and simply use geopandas capability to spatial join using point-in-polygons (points in your dataframe that are inside the polygon of cities).
gpd.sjoin(gpd_of_your_dataframe, city_polygon_df, op='within').Would be much faster - no need for OSM API and will be highly accurate as you are using official datasets.