Find the distance between 2 series of points in Pandas, Fastest Iteration

94 Views Asked by At

Have 2 sets of data, 1 which contains coordinates of fixed location called locations

Table of fixed locations

And a secondary table of vehicle movements called movements enter image description here

What would be the fastest way to iterate through both tables to find if any of the movements are within a certain distance of a location, e.g. the Euclidean distance between a point on the movements and a point on any of the locations?

Currently am using a nested loop which is incredibly slow. Both pandas df have been converted using

locations_dict=locations.to_dict('records')
movements_dict=movements.to_dict('records')

then iterated via:

for movement in movements_dict:
    visit='no visit'
    for location in locations_dict:
        distance = np.sqrt((location['Latitude']-movement['Lat'])**2+(location['Longitude']-movement['Lng'])**2)
        if distance < 0.05:
            visit=location['Location']
            break
        else:
            continue
    movement['distance']=distance
    movement['visit']=visit

Any way to make this faster? The main issue is this operation is a cartesian product, and any inserts will increase the complexity of the operation significantly.

1

There are 1 best solutions below

0
Claudio On

You can export the pandas data directly to numpy for example like this:

loc_lat=locations['Latitude' ].to_numpy()
loc_lon=locations['Longitude'].to_numpy()
mov_lat=movements['Lat'      ].to_numpy()
mov_lon=movements['Lon'      ].to_numpy()

From now on there is no need to use loops to obtain results as you can rely on numpy working an entire arrays at once. This should give a great speedup over the approach using Python looping over dictionary values.

Check out following code example showing how to get an array with all pairs from two arrays:

import numpy as np
a = np.array([1,2,3])
b = np.array([4,5])
print( np.transpose([np.tile(a, len(b)), np.repeat(b,len(a))]) )
gives_as_print = """
[[1 4]
 [2 4]
 [3 4]
 [1 5]
 [2 5]
 [3 5]]"""