import pandas as pd
from scipy.spatial import distance
from scipy.spatial.distance import cityblock
# Sample dataset with latitude and longitude
patient = {
'Patient': ['pt1'],
'lat': [34.0522],
'lon': [-118.2437]
}
patient = pd.DataFrame(patient)
hospital = {
'hospital': ['h1'],
'LATITUDE': [35.0522],
'LONGITUDE': [-133.2437]
}
hospital = pd.DataFrame(hospital)
distance = cityblock((patient['lat'], patient['lon']), (hospital['LATITUDE'], hospital['LONGITUDE']))
I am calculating distance between latitude-longitude points and manhattan distance (cityblock).
But what is the unit of measurement in the results?
Edited to add a reproducible example.
There's no sensible unit you could attach to the result of this calculation.
An analogy: imagine you are measuring the perimeter of your rectangular house. On two of the sides, you measure it as 20 feet. On two other sides, you measure it as 10 meters. You add this up: 20 + 20 + 10 + 10 = 60. Now, which unit do you use for the result, feet or meters? The answer is that neither is correct: this is adding together units of distance of different length.
Similarly, a degree of latitude and longitude span different distances. A degree of latitude spans approximately 111 km. (In fact, that is how the meter was originally defined.) A degree of longitude is more complicated. At the equator, it is the same as latitude. As you move away from the equator, a degree of longitude spans less distance. More specifically, a degree of longitude spans 111 km * cos(latitude).
Imagine you have 2 patients, each a kilometer away from the hospital, one who lives exactly north of the hospital, and one who lives exactly east of the hospital. Assuming the hospital is at latitude 34 degrees, if you measure distance in degrees, this means that you will compute that the patient who is north is 18% closer than the patient who is east.
If you want a measure of distance between two lat/lon pairs which has a physical meaning, I would suggest you compute the haversine distance instead. You could use this package or this sklearn function to do this.