Elasticsearch multi-match scoring based on the number of matching highlights

715 Views Asked by At

I am doing a multi-match search using the following query object using script_score:

{
    _source: [
        'baseline',
        'cpcr',
        'date',
        'description',
        'dev_status',
        'element',
        'event',
        'id'
    ],
    track_total_hits: true,
    query: {
       script_score: {
           query: {
               bool: {
                   filter: []
               },
           },
           script: {
               source: "def v=doc['description'].value; def score = 10000; score += v.length(); score -= " + "\"" + searchObject.query + "\"" + ".indexOf(v)*50;", // throws error
               params: { highlights: 3 }
           }
       }
    },
    highlight: { fields: { '*': {} } },
    sort: [],
    from: 0,
    size: 50
}

I'd like the results to be ordered by their number of highlight matches. For instance the first record would have 5 < em >'s, second record would have 4 < em > matches and so on. Currently my results aren't sorted this way.

elasticsearch.config.ts

"settings": {
        "analysis": {
            "analyzer": {
                "search_synonyms": {
                    "tokenizer": "whitespace",
                    "filter": [
                        "graph_synonyms",
                        "lowercase",
                        "asciifolding"
                    ],
                }
            }
        }
    },

    "mappings": {
        "properties": {
            "description": {
                "type": "text",
                "analyzer": "search_synonyms"
            },
            "narrative": {
                "type":"object",
                "properties":{
                    "_all":{
                        "type": "text",
                        "analyzer": "search_synonyms"
                    }
                }
            },
        }
    }

Sample data

1

There are 1 best solutions below

6
Joe - Check out my books On

I don't think that's possible.

When you think about it, since you're using multi_match, the docs w/ the most field matches would probably score highest which increases the chances that they would also have the most <em>s. It'd still be possible to post-process the hits and sort by the num. of occurrences.

The reason it's not possible is because the highligting mechanism works outside of the sort API and one cannot reach the other. One can always 'hack' it with some fancy script but there's no straightforward way to do it.


Addendum

Check out this related answer to access multiple fields within a script: https://stackoverflow.com/a/61620705/8160318