Find the nearest or the k nearest line to a given point using spatial index

193 Views Asked by At

I have a layer of streets representing streets and some points which are specific events. I want to find the nearest (the first 4) street(s) to each event. My question is about indexing. I can use sindex on the line dataframe but I am not sure how I can index the points. At the moment, I am using a 100m offset for each point and create a bounding box and then do the intersect and calculate the distance. But I would prefer a solution where I do not manipulate the index for points. Since I have two big datasets for both line and points, I am looking for an efficient solution.

points_proj =  points.to_crs('EPSG:2952')
lines_proj =  lines.to_crs('EPSG:2952')

lines_proj.sindex

offset = 100
bbox = points_proj.bounds + [-offset, -offset, offset, offset]

hits = bbox.apply(lambda row: list(lines_proj.sindex.intersection(row)), axis=1)

tmp = pd.DataFrame({
# index of points table
"pt_idx": np.repeat(hits.index, hits.apply(len)),
# ordinal position of line - access via iloc later
"line_i": np.concatenate(hits.values)
})

# Join back to the lines on line_i; we use reset_index() to 
# give us the ordinal position of each line
tmp = tmp.join(lines_proj.reset_index(drop=True), on="line_i")

# Join back to the original points to get their geometry
# rename the point geometry as "point"
tmp = tmp.join(points_proj.geometry.rename("point"), on="pt_idx")

# Convert back to a GeoDataFrame, so we can do spatial ops
tmp = gpd.GeoDataFrame(tmp, geometry="line_geom", crs=points_proj.crs)

tmp["snap_dist"] = tmp.geometry.distance(gpd.GeoSeries(tmp.point))
tmp["snap_dist"].sort_values()

# Discard any lines that are greater than tolerance from points
tmp_tolerance_100 = tmp.loc[tmp.snap_dist <= 100]
# Sort on ascending snap distance, so that closest goes to top
tmp_tolerance_100 = tmp_tolerance_100.sort_values(by=["snap_dist"])

# group by the index of the points and take the first, which is the
# closest line 
closest_tolerance_100 = tmp_tolerance_100.groupby("pt_idx").first()
# construct a GeoDataFrame of the closest lines
closest_tolerance_100 = gpd.GeoDataFrame(closest_tolerance_100, geometry="line_geom")

This solution is based on snapping problem.

I am basically looking for a solution where I do not need to define an offset for points.

offset = 100
bbox = points_proj.bounds + [-offset, -offset, offset, offset]

and something that is fast for two datasets where source is around 700,000 points and candidates are around 15000 lines.

I also saw an efficient solution using scikitlearn and geopandas, but it's for two point layers.

0

There are 0 best solutions below