My current goal is to find the distance between two points based on a latitude and longitude system, in order to track the trajectory of a flight. I have a pandas dataframe that contains changing latitude and longitude values. In order to find the distance between these points, I use the haversine distance function that takes these values as input in order to find the distance in kilometers.
I first tried to implement a for loop that iterates over the length of the flight and calculates the distance similar to the code below:
for i in range(len(df) - 1):
row1 = df.iloc[i]
row2 = df.iloc[i + 1]
result = haversine_distance(row1, row2)
However the dataset is very large, and due to the unefficiency in time I moved to a different strategy.
I then tried to implement a rolling window using the df.rolling function in pandas, along with a .apply with a lambda function as below:
df['DISTANCE'] = df[['Latitude', 'Longitude']].rolling(window=2).apply(lambda x: haversine_distance(x), raw = True)
My understanding of what happens here is that an 2d-array (from raw = True) is passed into the haversine function with 4 latitude and longitude values from the window.
However, I get a 1d array instead of the 2 values from 1 column rather than a 2d array of 4 values from 2 columns. What i mean by this is :
df = pd.DataFrame({'Latitude': [40.7128, 37.7749, 34.0522],
'Longitude': [-74.0060, -122.4194, -118.2437]})
If if the dataframe as shown above, I would get array [[40.7128, -74.0060],[37.7749,-122.4194]].
How can I fix my code or go about it differently in order to get these values? Attached below is the haversine function:
def haversine_distance(ndarray):
lat1, lat2 = ndarray[0][0], ndarray[0][1]
lon1, lon2 = ndarray[1][0], ndarray[1][1]
# Convert latitude and longitude from degrees to radians
lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat / 2.0) ** 2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0) ** 2
c = 2 * np.arcsin(np.sqrt(a))
km = 6371 * c
return km
and this is the desired output:
df = pd.DataFrame({'Latitude': [40.7128, 37.7749, 34.0522],
'Longitude': [-74.0060, -122.4194, -118.2437],
'DISTANCE': [0, 4129.0861, 559.1205]})
You need to vectorize your haversine function, then craft an array with 4 columns in the correct order (with
shift+concat+to_numpy) and pass this to the function:Output:
NB. instead of reordering the columns with
.iloc[:, [0,2,1,3]]you could also uselat1, lon1, lat2, lon2 = ndarray.Tin the function.Intermediate
a:Alternatively, a function that takes the
dfas input: