This is a cross-post of a bug filed on the OpenSearch KNN repo -- see https://github.com/opensearch-project/k-NN/issues/1525
The KNN score function for inner-product is defined in the OpenSearch documentation as:
If I run a KNN search without any filters, for example:
58 const query = {
59 size: 10,
60 query: {
61 knn: {
62 rasterizedImageEmbedding: {
63 vector: vec,
64 k: 500,
75 },
76 },
77 },
78 sort: [{ _score: { order: 'desc' } }, { dateCreated: { order: 'desc' } }],
79 _source: {},
80 };
I get output scores that match the documented formula. For example, I will get a return like:
id: 8840b3af-f53f-4e8f-abcd-61116b183a17
opensearch returned score: 1.8678249
my calculated distance: -0.8678249630676234
my calculated score: 1.8678249630676234
where my calculated distance/score are my own personal implementations of the distance function and the scoring algorithm.
If I run the same search with an additional filter, for example:
58 const query2 = {
59 size: 10,
60 query: {
61 knn: {
62 rasterizedImageEmbedding: {
63 vector: vec,
64 k: 500,
65 filter: {
66 bool: {
67 must: {
68 match: {
70 spaceId,
71 },
72 },
73 },
74 },
75 },
76 },
77 },
78 sort: [{ _score: { order: 'desc' } }, { dateCreated: { order: 'desc' } }],
79 _source: {},
80 };
I get output scores that are fairly confusing and do not seem to map to a particular formula (certainly not the one in the screenshot above):
id: 8840b3af-f53f-4e8f-abcd-61116b183a17
opensearch returned score: 0.9339125
my calculated distance: -0.8678249630676234
my calculated score: 1.8678249630676234
Note that the OpenSearch returned score is...something else, for the same UUID.
What is causing this inconsistency? Is there any documentation about how Opensearch is behaving here?