I have a time series dataset from 1970 to 2020 as my training dataset, and I have another single observation of 2021, what I have to do right now is to use Mahalanobis distance to identify 10 nearest neighbor of 2021 in training dataset. I tried several function like get.knn() and get.knnx(), but I failed to set distance as Mahalanobis distance. Is there any function that i can use? Thank you in advance!
------------------edit--------------------
So I tried function of mahalanobis() and then I got a list of values, are these values the mahalanobis distance? Can I sort them to get the top 10?
Background
The Mahalanobis distance measures how far a point is away from the mean, measured in standard deviations, see Wikipedia. It uses eigenvalue rotated coordinates and is related to pricipal component analysis. Cross Validated contains several excellent explanations, e.g. this "bottom-to-top-explanation" or a function (
cholMaha, see below) how to estimate a distance matrix.Relationship of Mahalanobis distance to PCA
Let's assume a small data example:
Then we can estimate the Mahalanobis distance matrix via
D2.distfrom package biotools or the above mentioned function:Now comes the point. We can also estimate the Mahalanobis distance as the euclidean distance of the re-scaled loadings (rotated data) of a principal components analysis:
The result is identical to the two approaches above.
Application to a classification algorithm
One can now use this relation in any classification algorithm. Just transform the data matrix by PCA rotation and fit their euclidean distances in.
In effect, small influece factors hidden in the variables are upscaled, but unfortunately also random errors. To understand this in detail, read the "My grandma cooks" answer at Cross Validated.