In pandas dataframe, how to add a new column with values generated using other column values for the row?

31 Views Asked by At

Give a data frame like below:

index H_Lat H_Lon W_Lat W_Lon
0 18.447259 73.896742 18.534579 73.819043
1 18.523069 73.842460 18.491357 73.851985
2 18.511014 73.864071 NaN NaN

I want a new column added that calculates the distance between the two places (H_Lat, H_Lon) and (W_Lat, W_Lon)

index H_Lat H_Lon W_Lat W_Lon Distance
0 18.447259 73.896742 18.534579 73.819043 12.678631
1 18.523069 73.842460 18.491357 73.851985 3.651333
2 18.511014 73.864071 NaN NaN NaN

I followed Getting distance between two points based on latitude/longitude answer to calculate the distance between two coordinates.

But I am not sure how to add a new column to the data frame for each row.

2

There are 2 best solutions below

0
Bhavesh Neekhra On

Define a function that calculates the distance:

def get_geo_Distance(row):
     #changed `1` to string for correct output
    coords_1 = (row['H_Lat'], row['H_Lon'])
    coords_2 = (row['W_Lat'], row['W_Lon'])
    try:    
        distance = geopy.distance.geodesic(coords_1, coords_2).km
    except ValueError:
        distance =np.NaN    
    
    return distance

The following code will add a new column 'distance' to the data frame:

distance_df['distance'] = distance_df.apply(get_geo_Distance, axis=1)              
print (distance_df)   
0
mozway On

You can vectorize the haversine formula with numpy:

import numpy as np

def haversine(lat1, lon1, lat2, lon2):
    lat1 = np.radians(lat1)
    lon1 = np.radians(lon1)
    lat2 = np.radians(lat2)
    lon2 = np.radians(lon2)
    return np.arcsin(np.sqrt(
              np.sin((lat2-lat1)/2)**2
            + np.cos(lat1)
            * np.cos(lat2)
            * np.sin((lon2-lon1)/2)**2
           ))*2*6367

df['Distance'] = haversine(df['H_Lat'], df['H_Lon'], df['W_Lat'], df['W_Lon'])

Or, a variant passing a numpy array as input:

def haversine(arr):
    '''Takes a numpy array with lat1, lon1, lat2, lon2 columns as input'''
    lat1, lon1, lat2, lon2 = np.radians(arr).T
    return np.arcsin(np.sqrt(
              np.sin((lat2-lat1)/2)**2
            + np.cos(lat1)
            * np.cos(lat2)
            * np.sin((lon2-lon1)/2)**2
           ))*2*6367

df['Distance'] = haversine(df[['H_Lat', 'H_Lon', 'W_Lat', 'W_Lon']].to_numpy())

Output:

   index      H_Lat      H_Lon      W_Lat      W_Lon   Distance
0      0  18.447259  73.896742  18.534579  73.819043  12.696821
1      1  18.523069  73.842460  18.491357  73.851985   3.664156
2      2  18.511014  73.864071        NaN        NaN        NaN