ElasticSearch, calculate 75th percentile of the first 25 hits of _score

40 Views Asked by At

In ElasticSearch I'm looking for a multi_match in three fields (Field1, Field2, Field3). I now want to calculate within elasticsearch aggs function the 75th of the _score values. Calculation should take place within the ElasticSearch Query

query = {
    "size": 25,
    "query": {
        "multi_match": {
            "query": "keyphrase",
            "fields": ["field1", "field2", "field3"]
        }
    },
    "aggs": {
        "percentile_score": {                     
            "percentiles": {
            "field": "_score",
            "percents": [ 75.0 ]
          }
        },
    }
}
responnse = client.search(index=INDEX_NAME, body = query)
for hit in responnse["hits"]["hits"]:
    print(f"Score: {hit['_score']}")

Score: 9.517459 Score: 8.774883 ... Score: 5.489334 Score: 4.481924

responnse["aggregations"]["percentile_score"]["values"]["75.0"]

I expect the 75th percentile to be returned to me, but I only get the value None

1

There are 1 best solutions below

0
imotov On

First of all I would like to mention that aggregations don't depend on the hits that you are getting back. You can request 0, 10, 100 or 1000 hits and with all these hits you will get exactly the same aggregation result. It happens because aggregations are calculated on the entire result set not just on the first 10 or 25 hits that you happen to retrieve.

The second issue is that running cardinality aggregation is not supported by elasticsearch and is unlikely to be supported in the near future.

I would love to suggest you some alternative, but I have no idea what you expect the 75th percentile of the first 25 hits of _score to represent. In other words, what meaning are you trying to extract from this number? What does it represent for you?