How do I calculate the euclidean distance to the nearest neighbour for each coordinates pair in meters in Pandas dataframe?

Question

How do I calculate the euclidean distance to the nearest neighbour for each coordinates pair in meters in Pandas dataframe?

121 Views Asked by Dmitri Ilin At 29 January 2024 at 15:04

I have a dataframe like this

index	place id	var_lat_fact	var_lon_fact
0	167312091448	5.6679820000	-0.0144950000
1	167312091448	5.6686320000	-0.0157910000
2	167312091448	5.6653530000	-0.0181980000
3	167312091448	5.6700970000	-0.0191400000
4	167312091448	5.6689810000	-0.0104040000

For each coordinates pair (lat, lon) I'd like to calculate the euclidean distance to the nearest neighbour within the dataframe. So each point gets a metric in the additional column (say, nearest_neighbour_dist) indicating that distance in meters.

Something like this

index	place id	var_lat_fact	var_lon_fact	nearest_neighbour_dist
0	167312091448	5.6679820000	-0.0144950000	160.588370
1	167312091448	5.6686320000	-0.0157910000	160.588370
2	167312091448	5.6653530000	-0.0181980000	451.525301
3	167312091448	5.6700970000	-0.0191400000	404.794908
4	167312091448	5.6689810000	-0.0104040000	466.104453

Just can't get my head around this... Any help would be greatly appreciated.

Original Q&A

There are 2 best solutions below

Michael Gruner On 29 January 2024 at 15:42

The first thing, you can't compute euclidean distances in the Geographic Coordinate System (longitude and latitude). You need to convert these points to Cartesian Coordinates. Also, are you sure you're looking for the Euclidean distance? Something like the Geodesic distance seems more natural for this problem. The Euclidean distance will give you the distance "through" the earth, while the Geodesic will give you the distance as if you were walking over the curvature of the earth.

Distance to Nearest Neighbor with Euclidean Distance

Convert to Euclidean coordinates

import pandas as pd
import numpy as np

df = pd.read_csv('path_to_your_csv.csv')

earth_radius = 6371000
df['x'] = earth_radius * np.cos(df['var_lat_fact']) * np.cos(df['var_lon_fact'])
df['y'] = earth_radius * np.cos(df['var_lat_fact']) * np.sin(df['var_lon_fact'])
df['z'] = earth_radius * np.sin(df['var_lat_fact'])

Compute the distance between all the points

from scipy.spatial import distance_matrix

# Create a matrix of all points
points = df[['x', 'y', 'z']].to_numpy()

# Compute the distance matrix
dist_matrix = distance_matrix(points, points)

# Set the diagonal to infinity to ignore zero distance to self
np.fill_diagonal(dist_matrix, np.inf)

# Find the minimum distance for each point
df['nearest_neighbor_dist'] = np.min(dist_matrix, axis=1)

# Drop the Cartesian coordinates as they are no longer needed
df = df.drop(['x', 'y', 'z'], axis=1)

The df['nearest_neighbor_dist'] now contains the Euclidean distance to the nearest neighbor.

Distance to Nearest Neighbor with Geodesic Distance

Compute the nearest neighbor distance to each point:

import pandas as pd
from geopy.distance import geodesic

# Convert the latitude and longitude from your DataFrame to a list of (lat, lon) tuples
coordinates = list(zip(df['var_lat_fact'], df['var_lon_fact']))

# Initialize a list to hold the nearest neighbor distances
nearest_neighbor_dists = []

# Calculate the geodesic distance from each point to every other point
for i in range(len(coordinates)):
    distances = [geodesic(coordinates[i], coordinates[j]).meters for j in range(len(coordinates)) if i != j]
    # Keep the smallest one
    nearest_neighbor_dists.append(min(distances))

The df[nearest_neighbor_dist] column now contains the distance to the nearest neighbor in geodesic distance.

df['nearest_neighbor_dist'] = nearest_neighbor_dists

**mozway** · Accepted Answer · 2024-01-29T15:36:01.497000

You can use sklearn's NearestNeighbors:

from sklearn.neighbors import NearestNeighbors
from numpy import deg2rad

# set up the nearest neighbors
neigh = NearestNeighbors(n_neighbors=1, metric='haversine')
data = deg2rad(df[['var_lat_fact', 'var_lon_fact']])
neigh.fit(data)

# find the closest two points
# the closest distance is self, the second one is the closest non-self
df['nearest_neighbour_dist'] = (neigh.kneighbors(data,
                                                 n_neighbors=2, return_distance=True
                                                )[0][:, -1]
                                *6371*1000
                               )

Output:

   index      place_id  var_lat_fact  var_lon_fact  nearest_neighbour_dist
0      0  167312091448      5.667982     -0.014495              160.588370
1      1  167312091448      5.668632     -0.015791              160.588370
2      2  167312091448      5.665353     -0.018198              451.525301
3      3  167312091448      5.670097     -0.019140              404.794908
4      4  167312091448      5.668981     -0.010404              466.104453

Points on a map

I wanted to double check the validity of the computations

1 -> 2 (index 0-> 1 in your data) is indeed about 160.6 meters

How do I calculate the euclidean distance to the nearest neighbour for each coordinates pair in meters in Pandas dataframe?

There are 2 best solutions below

Points on a map

Distance to Nearest Neighbor with Euclidean Distance

Distance to Nearest Neighbor with Geodesic Distance

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in COORDINATES

Related Questions in NEAREST-NEIGHBOR

Related Questions in HAVERSINE

Trending Questions

Popular # Hahtags

Popular Questions