I have an API that builds the following query to Elasticsearch:
"should": [
{
"multi_match": {
"query": "1225",
"fields": [
"id^1.0",
"title^1.0",
"titleNgram^1.0"
],
"type": "most_fields",
"operator": "OR",
"slop": 0,
"prefix_length": 0,
"max_expansions": 50,
"zero_terms_query": "NONE",
"auto_generate_synonyms_phrase_query": true,
"fuzzy_transpositions": true,
"boost": 1
}
}
]
id field: client-built. No distinct pattern or length, can be a mix of letters and numbers, sometimes filtered by prefix, cannot be reliably programmatically identified as an ID from the freetext search query.
title: string search
Problem: The ngram field is causing many irrelevant results to be returned. E.g. query:"nurse" results: "nurse", "customer service representative"
Removing the ngram field from the query helps, but then my search is not quite fuzzy enough to differentiate between possibly relevant terms. E.g.:
query: "Police man" results: "police man, milk man, pizza man" missing from results: "policeman"
Are there any suggestions for what I can tinker with here get more accurate results?