How to calculate an average value based on K-nearest neighbors?

61 Views Asked by Lucy At 06 March 2024 at 20:06

I would like to write a function to calculate an average 'z' value based on K nearest neighbors (in this case K=2). I have the indices but can someone help me write a function for calculating the average z value for all the neighbors?

This is what I have so far:

from sklearn.neighbors import NearestNeighbors

X = array([[6,-3, 0.1], [-5,-9, 0.5], [3,-7, 0.8], [-10,6, 0.5], [-4,-16, 0.9], [1,-0.5, 0]])
# X is an array containing x,y,z values
# nbrs reads in the x,y values only

nbrs = NearestNeighbors(n_neighbors=2).fit(X)
distances, indices = nbrs.kneighbors(X)

print(indices)
# psuedocode below
[[0, index for neighbor1, index for neighbor2]
 [1, index for neighbor1, index for neighbor2]
 [2, index for neighbor1, index for neighbor2]
 [3, index for neighbor1, index for neighbor2]
......
# etc. for all 6 points in X
]

Now that I have the indices I'd like to calculate the average z value for all the neighbors? I recognize there is only 2 here so it is easy to average but if we changed it to 50 neighbors can someone help me scale this up?

Original Q&A

There are 2 best solutions below

proof-of-correctness On 06 March 2024 at 20:28

To find the average z value of neighbors of each point in X, you can do:

all_z_pairs = [[X[index][2] for index in row] for row in indices]
mean_values = [sum(z_pair)/len(z_pair) for z_pair in all_z_pairs]

X[index] represents each neighbor and X[index][2] is the neighbor's z-value. Thus all_z_pairs is all z values for each neighbor of each point.

sum(z_pair)/len(z_pair) finds the mean. You can also do this to make it more readable:

from statistics import mean

...
mean_values = [mean(z_pair) for z_pair in all_z_pairs]

You can rewrite the all_z_pairs calculation as the following if it makes it easier to understand.

for row in indices:
   for index in row:
      all_z_pairs.append(X[index][2])

The indices list has a row for each point in X. A row is basically all neighbours of that point. So, the first list is looping over all sets of neighbours and the second list is looping over each and every neighbour.

Nick ODell On 08 March 2024 at 00:28

If you want to predict some continuous value based on the nearest neighbor's values, then you can use KNeighborsRegressor to solve this.

Example:

import numpy as np
from sklearn.neighbors import KNeighborsRegressor
X = np.array([[6,-3, 0.1], [-5,-9, 0.5], [3,-7, 0.8], [-10,6, 0.5], [-4,-16, 0.9], [1,-0.5, 0]])
neigh = KNeighborsRegressor(n_neighbors=2, weights='uniform')
neigh.fit(X[:, :2], X[:, 2])
neigh.predict([[4, -7]])

Since you're asking for an average of all neighbors, I used weights='uniform'. An alternative to this is weights='distance', which gives closer neighbors more weight.

Docs

How to calculate an average value based on K-nearest neighbors?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in AVERAGE

Related Questions in NEAREST-NEIGHBOR

Trending Questions

Popular # Hahtags

Popular Questions