I have two different datasets (samples below):
I am trying to find the minimum distance between each site in dataset 1 and dataset 2. So each location in dataset 1 would have a column showing the distance from closest site that exists in dataset 2.
So far I have this, but I can't get it work. Any advice how to proceed is appreciated.
from geopy import distance
import pandas as pd
s = {
'site_id': dataset1['site_id'],
'latitude' : dataset1['latitude'],
'longitude' : dataset1['longitude']
}
d = {
'site_id': dataset2['site_id'],
'latitude' : dataset2['latitude'],
'longitude' : dataset2['longitude']
}
#s = pd.DataFrame(s)
#d = pd.DataFrame(d)
for (ss, a) in s.items():
best = None
dist = None
for (dd, b) in d.items():
km = distance.distance(a, b).km
if dist is None or km < dist:
best = dd
dist = km
print(f'{ss} is nearest {best}: {dist} km')
You can use the Haversine formula to calculate the distance between two points given their latitude and longitude coordinates. Here's an example of how you can modify your code to use the Haversine formula:
Update your code with the following