How to calculate an average value based on K-nearest neighbors?

61 Views Asked by At

I would like to write a function to calculate an average 'z' value based on K nearest neighbors (in this case K=2). I have the indices but can someone help me write a function for calculating the average z value for all the neighbors?

This is what I have so far:

from sklearn.neighbors import NearestNeighbors

X = array([[6,-3, 0.1], [-5,-9, 0.5], [3,-7, 0.8], [-10,6, 0.5], [-4,-16, 0.9], [1,-0.5, 0]])
# X is an array containing x,y,z values
# nbrs reads in the x,y values only

nbrs = NearestNeighbors(n_neighbors=2).fit(X)
distances, indices = nbrs.kneighbors(X)

print(indices)
# psuedocode below
[[0, index for neighbor1, index for neighbor2]
 [1, index for neighbor1, index for neighbor2]
 [2, index for neighbor1, index for neighbor2]
 [3, index for neighbor1, index for neighbor2]
......
# etc. for all 6 points in X
] 

Now that I have the indices I'd like to calculate the average z value for all the neighbors? I recognize there is only 2 here so it is easy to average but if we changed it to 50 neighbors can someone help me scale this up?

2

There are 2 best solutions below

2
proof-of-correctness On

To find the average z value of neighbors of each point in X, you can do:

all_z_pairs = [[X[index][2] for index in row] for row in indices]
mean_values = [sum(z_pair)/len(z_pair) for z_pair in all_z_pairs]

X[index] represents each neighbor and X[index][2] is the neighbor's z-value. Thus all_z_pairs is all z values for each neighbor of each point.

sum(z_pair)/len(z_pair) finds the mean. You can also do this to make it more readable:

from statistics import mean

...
mean_values = [mean(z_pair) for z_pair in all_z_pairs]

You can rewrite the all_z_pairs calculation as the following if it makes it easier to understand.

for row in indices:
   for index in row:
      all_z_pairs.append(X[index][2])

The indices list has a row for each point in X. A row is basically all neighbours of that point. So, the first list is looping over all sets of neighbours and the second list is looping over each and every neighbour.

0
Nick ODell On

If you want to predict some continuous value based on the nearest neighbor's values, then you can use KNeighborsRegressor to solve this.

Example:

import numpy as np
from sklearn.neighbors import KNeighborsRegressor
X = np.array([[6,-3, 0.1], [-5,-9, 0.5], [3,-7, 0.8], [-10,6, 0.5], [-4,-16, 0.9], [1,-0.5, 0]])
neigh = KNeighborsRegressor(n_neighbors=2, weights='uniform')
neigh.fit(X[:, :2], X[:, 2])
neigh.predict([[4, -7]])

Since you're asking for an average of all neighbors, I used weights='uniform'. An alternative to this is weights='distance', which gives closer neighbors more weight.

Docs