Inconsistent results when using KNN innerproduct scoring on OpenSearch depending on if a filter is set

21 Views Asked by At

This is a cross-post of a bug filed on the OpenSearch KNN repo -- see https://github.com/opensearch-project/k-NN/issues/1525

The KNN score function for inner-product is defined in the OpenSearch documentation as: enter image description here If I run a KNN search without any filters, for example:

 58   const query = {                                                               
 59     size: 10,                                                                    
 60     query: {                                                                      
 61       knn: {                                                                      
 62         rasterizedImageEmbedding: {                                               
 63           vector: vec,                                                            
 64           k: 500,                                                                 
 75         },                                                                        
 76       },                                                                          
 77     },                                                                            
 78     sort: [{ _score: { order: 'desc' } }, { dateCreated: { order: 'desc' } }],    
 79     _source: {},                                                                  
 80   };   

I get output scores that match the documented formula. For example, I will get a return like:

id: 8840b3af-f53f-4e8f-abcd-61116b183a17                                        
opensearch returned score: 1.8678249                      
my calculated distance: -0.8678249630676234          
my calculated score: 1.8678249630676234   

where my calculated distance/score are my own personal implementations of the distance function and the scoring algorithm.

If I run the same search with an additional filter, for example:

 58   const query2 = {                                                                
 59     size: 10,                                                                     
 60     query: {                                                                      
 61       knn: {                                                                      
 62         rasterizedImageEmbedding: {                                               
 63           vector: vec,                                                            
 64           k: 500,                                                                 
 65           filter: {                                                               
 66             bool: {                                                              
 67               must: {                                                            
 68                 match: {                                                         
 70                   spaceId,                                                       
 71                 },                                                               
 72               },                                                                 
 73             },                                                                   
 74           },                                                                     
 75         },                                                                       
 76       },                                                                         
 77     },                                                                           
 78     sort: [{ _score: { order: 'desc' } }, { dateCreated: { order: 'desc' } }],   
 79     _source: {},                                                                 
 80   }; 

I get output scores that are fairly confusing and do not seem to map to a particular formula (certainly not the one in the screenshot above):

id: 8840b3af-f53f-4e8f-abcd-61116b183a17
opensearch returned score: 0.9339125
my calculated distance: -0.8678249630676234
my calculated score: 1.8678249630676234

Note that the OpenSearch returned score is...something else, for the same UUID.

What is causing this inconsistency? Is there any documentation about how Opensearch is behaving here?

0

There are 0 best solutions below