How to fine tune search results that contain most searching words in Elasticsearch 6.8?

92 Views Asked by At

Below is my mapping:

{
  "mappings": {
    "_doc": {
      "properties": {
        "text": { 
          "type": "text",
          "fields": {
            "raw": { 
              "type":     "keyword",
              "normalizer": "case_insensitive"
            }
          }
        }
      }
    }
  }
}

Settings look like following:

{
  "settings": {
    "index": {
      "analysis" : {
        "normalizer" : {
          "case_insensitive" : {
            "filter" : "lowercase"
          }
        },
        "analyzer" : {
          "en_std" : {
            "type" : "standard",
            "stopwords" : "_english_"
          }
        }
      },
    }
  }
} 

Below is my query:

{
  "query": {
    "bool" : {
      "must" : {
        "query_string" : {
          "query" : "hawaii beach 2019",
          "analyze_wildcard: true,
          "fields": [
            "text"
          ]
        }
      },
    }
  }
}

Below is sample data which is stored in Elasticsearch:

[
  {
     "text": "blue hawaii hotel"
  },
  {
     "text": "costa beach"
  },
  {
     "text": "white hawaii beach"
  },
  {
     "text": "nice hotel 2019"
  },
  {
     "text": " some 2019 white beach hawaii photo"
  },
  {
     "text": "hawaii vacation 2019"
  },
]

If my searching word is hawaii, I get three results which are:

[
  {
     "text": "blue hawaii hotel"
  },
  {
     "text": "white hawaii beach"
  },
  {
     "text": " some 2019 white beach hawaii beach photo"
  },
]

If my searching word is hawaii beach, I get four results which are:

[
  {
     "text": "blue hawaii hotel"
  },
  {
     "text": "costa beach"
  },
  {
     "text": "white hawaii beach"
  },
  {
     "text": " some 2019 white beach hawaii photo"
  },
]

If my searching word is hawaii beach 2019, I get five results which are:

[
  {
     "text": "blue hawaii hotel"
  },
  {
     "text": "costa beach"
  },
  {
     "text": "white hawaii beach"
  },
  {
     "text": "nice hotel 2019"
  },
  {
     "text": " some 2019 white beach hawaii photo"
  },
]

This is because each record contains one word of the searching text. It makes sense but it is not exactly what I want. I want that the record which contains most matching words appears on top of the search results and record which contains less matching word appears at the bottom of the search results. How can I do that in Elasticsearch 6.8? If this cannot be realized, showing only record which contains most matching words is also desired as search results.

Desired search results if my search text is e.g. hawaii beach 2019:

[
  {
     "text": " some 2019 white beach hawaii photo" // Contains most matching words.
  },
  {
     "text": "white hawaii beach"
  },
  {
     "text": "blue hawaii hotel" // Contains less matching words.
  },
  {
     "text": "costa beach" // Contains less matching words.
  },

  {
     "text": "nice hotel 2019" // Contains less matching words.
  },

]

or

[
  {
     "text": " some 2019 white beach hawaii photo" // Contains most matching words
  },
]
2

There are 2 best solutions below

5
On

You can modify your input query:

hawaii AND beach AND 2019

Then you will get results with all 3 words.

2
On

I think I have found a work-around solution by surrounding each word in the searching string by * as following.

{ 
  "query": { 
    "bool": { 
      "must": { 
        "bool": { 
          "should": { 
            "query_string": { 
              "query": "*hawaii* *beach* *2019*", 
              "fields": ["text"]
            } 
          } 
        } 
      } 
    } 
  } 
}

With this query I get all documents which contains at least one word of the searching string. Documents with most matching searching words appear on top of the list.