Patients often bypass their nearest hospital to go to another hospital for surgery (- many reasons for that). I have 500,000 patient episodes of patients attending 24 hospitals in the UK.
I want to know the proportion of patients attanding a hospital that wasnt the nearest option. So say a hospital in London had 100 patients and 20 should have gone to Cambridge their proportion is 20%. In the example below patient2's nearest hospital may well have been ulon1,ulat1. u standing for neurosurgical unit(=hospital).
I have the Latitude and longitude of the patients and the hospitals. I can't show the data of patient codes because of confidentiality.
Essentially my dataframe looks like this
d = {'patient_ID': [0, 1, 2, 3, 5,], 'patient_lon': [ 'plon1', 'plon2', 'plon3', 'plon4', 'plon5'], 'patient_lat': ['plat1','plat2', 'plat3', 'plat4','plat5'],\
'unit_lon' : ['ulon1', 'ulon2', 'ulon3', 'ulon4', 'ulon5'], 'unit_lat': ['ulat1', 'ulat2','ulat3', 'ulat4', 'ulat5']}
pd.DataFrame(data=d)
|patient_ID |patient_lon | patient_lat | unit_lon | unit_lat
-------- ---------- ---------- -------- -------
|0 | plon1 | plat1 | ulon1 | ulat1
|1 | plon2 | plat2 | ulon2 | ulat2
|2 | plon3 | plat3 | ulon3 | ulat3
|3 | plon4 | plat4 | ulon4 | ulat4
|5 | plon5 | plat5 | ulon5 | ulat5
I have used the Haversine method to calculate distance from the patient to the hospital they attended.
How can I use that to calculate all the distances to the 24 hospitals and find the minimum as the 'local' one. (They all provide neurosurgery which is what I am interested in). Then compare that to the one they actually went to in a new dataframe column.
BTW I am a surgeon so a novice here.
My answer below reformats the data slightly to store lat long in tuples for some of the columns, hope this is ok but if not please respond and we'll work up the answer.
1. Simulate some plausible patient locations
2. Store locations of hospitals
Next up we'll store the names and latlong coords of the hospitals. In your example above this would be your 24 UK hospitals, again I've just made some things up here.
3. Assemble dataframe
Now we use the data above to create some lists of data and a dataframe.
Output:
4. Find nearest hospital
We write a function for finding the nearest hospital. This is probabaly a bit bespoke to our example. As mentioned above comments, haversine is a very convenient library for this. This function returns the key of the nearest hospital. We can look this up in our
hospitalsdict.5. Compute distances
Assign new columns in dataframe calculating the nearest hospitals to the patients. Transform is a bit faster than apply, ideally we'd probably use a numpy vectorized function but this might be fast enough for your use case. If not, write back and we can take a look.
Output:
On a related/unreleated note, writing from a neurological ward.