From the slides of a course, I found these:
Given a set P in R^D, and a query point q, it's NN is point p_0 in P, where:
dist(p_0, q) <= dist(p, q), for every p in P.
Similarly, with an approximation factor 1 > ε > 0, the ε-NN is p_0, such that:
dist(p_0, q) <= (1+ε) * dist(p, q), for every p in P.
(I wonder why ε can't reach 1).
We build a KD-tree and then we search for the NN, with this algorithm:
which is correct, as far as my mind goes and my testing.
How should I modify the above algorithm, in order to perform Approximate Nearest Neighbour Search (ANNS)?
My thought is to multiply the current best (at the part of the update in the leaf) with ε and leave the rest of the algorithm as is. I am not sure however, if this is correct. Can someone explain?
PS - I understand how search for NN works.
Note that I asked in the Computer Science site, but I got nothing!
The one modification needed is to replace
current best distancewithcurrent best distance/(1+ε). This prunes the nodes that cannot contain a point violating the new inequality.The reason that this works is that (assuming that
cut-coor(q)is on the left side) the testis checking to see if the hyperplane separating
left-childandright-childis closer thancurrent best distance, which is a necessary condition for a point inright-childto be closer than that to the query pointq, as the line segment joiningqand a point inright-childpasses through that hyperplane. By replacingd(p_0, q) = current best distancewithcurrent best distance/(1+ε), we're checking to see if any pointpon the right side could satisfywhich is equivalent to
which is a witness to the violation of the approximate nearest neighbor guarantee.