Elasticsearch performance difference between ‘must’ and ‘must_not’

555 Views Asked by At

I wanted to know the performance difference between using must clause and must_not. I am getting different timings trying from both of them. Suppose I have 10 groups, and I want to make 5 groups accessible to a user while 5 are excluded. So, I have two ways of using my query:

I can use must clause inside a boolean query, and do must: ['1', '2' ,'3', '4', '5']. I can use must_not clause inside a boolean query again, and do must_not:['6', '7', '8', '9', '10'].

I have not provided many details here because I just want to know more about the performance wise difference in using these two terms. I read about the Boolean query in the ES document, and it said the scoring is ignored in the must_not clause, although I have not yet understood how scoring is performed in the Lucene index. But I am getting some timing differences, and must_not is taking longer time than must and was curious to post about it.

Note: Currently using, Elasticsearch version:2.4.4, and upgrading it is not possible at the moment. Can anyone please explain the difference or explain both of the clauses in detail? Open to any kind of suggestions and answers. Thanks in advance.

1

There are 1 best solutions below

0
On

must clause is potentially more efficient because it can utilize the inverted index.

The internal implementation is more like

If _searched_keyword_ in inverted_hash
THEN RETRIEVE inverted_hash[_searched_keyword_ ]

must_not is more costly as the inverted index is not helpful.