Average distance within group in pandas

113 Views Asked by At

I have a dataframe like this

df = pd.DataFrame({
    'id': ['A','A','B','B','B'],
    'x': [1,1,2,2,3],
    'y': [1,2,2,3,3]
})

enter image description here

The output I want is the average distance for each point in the group, in this example

group A: (distance((1,1),(1,2))) /1 = 1

group B: (distance((2,2),(2,3)) + distance((2,3),(3,3)) + distance((2,2),(3,3))) /3 = 1.138

enter image description here

I can calculate the distance using np.linalg.norm but I confused to use it in pandas groupby. Thank you

Note: 1 of my idea is I am trying to make this dataframe first (where I stuck), which is contains the pairs of point that I need to calculate the distance and after this I just need to calculate distance and groupby mean

enter image description here

1

There are 1 best solutions below

3
PaulS On BEST ANSWER

A possible solution, based on numpy broadcasting:

def calc_avg_distance(group):
    x = group[['x', 'y']].values
    dist_matrix = np.sqrt(((x - x[:, np.newaxis])**2).sum(axis=2))
    np.fill_diagonal(dist_matrix, np.nan)
    avg_distance = np.nanmean(dist_matrix)
    return avg_distance


(df.groupby('id').apply(lambda x: calc_avg_distance(x))
 .reset_index(name='avg_distance'))

Output:

 id  avg_distance
0  A      1.000000
1  B      1.138071