R package 'BiocNeighbors' resturn different nearest neighbors with other packages (e.g., FNN package)

34 Views Asked by At

I recently used the kNN related functions, and found there are several powerful packages handling this issue. I tried 3 packages (BiocNeighbors, FNN, RANN) and want to find the nearest neighbors for each point. But finally, I found the result from BiocNeighbors using 'RcppAnnoy' gave different result in the 4th points. The last value in the 4th point should be 8, instead of 3 from BiocNeighbors result.

The reproducible code is below: `

set.seed(1234567)
cls_1_c1 <- rnorm(3, mean = 1, sd = 0.2)
cls_1_c2 <- rnorm(3, mean = 2, sd = 0.8)


cls_2_c1 <- rnorm(3, mean = 4, sd = 0.2)
cls_2_c2 <- rnorm(3, mean = 6, sd = 0.8)


cls_3_c1 <- rnorm(3, mean = 7, sd = 0.2)
cls_3_c2 <- rnorm(3, mean = 8, sd = 0.8)

dat <- cbind(c(cls_1_c1, cls_2_c1, cls_3_c1), c(cls_1_c2, cls_2_c2, cls_3_c2))

colnames(dat) <- c("c1", "c2")
dat <- as.data.frame(dat)
dat$name <- paste0("p", 1:9)

plot(x = dat$c1, y = dat$c2, xlab = "x", ylab = "y")
text(dat$c1, dat$c2, dat$name)

dat_mat <- as.matrix(dat[, c("c1", "c2")])

res_annoy <- BiocNeighbors::findKNN(dat_mat, k = 3, BNPARAM = AnnoyParam(ntrees = 1000))
print(res_annoy$index)

res_fnn <- FNN::knn.index(dat_mat, k = 3)
print(res_fnn)

res_rann <- RANN::nn2(data = dat_mat, query = dat_mat, k = 4)
print(res_rann$nn.idx[, -1])`

You can see the 4th row, result from print(res_annoy$index) are different from other 2 (they are same results).

` > print(res_annoy$index)

 [4,]    5    6    3 *


> print(res_fnn)
[4,]    5    6    8 *


> print(res_rann$nn.idx[, -1])
[4,]    5    6    8 *

` Could you please help me figure out what is the possible reason for the above differences even though the input is same.

Thanks.

I tried the above script and I expect they are same results from all 3 methods.

0

There are 0 best solutions below