I'm struggling to understand a result I'm getting while using the suggest API.
The goal is that I don't want that this result to be returned.
How to reproduce - here is my mapping :
PUT /movies
{
"settings": {
"analysis": {
"filter": {
"true_false_filter": {
"type": "keep",
"keep_words": [
"true",
"false"
]
},
"french_elision": {
"type": "elision",
"articles_case": false,
"articles": [
"puisqu"
]
},
"french_stemmer": {
"type": "stemmer",
"language": "light_french"
},
"organic-dictionary": {
"type": "synonym",
"expand": true,
"lenient": true,
"synonyms": [
"non bio"
]
},
"french_stop_filter": {
"type": "stop",
"ignore_case": true,
"stopwords": "_french_"
}
},
"analyzer": {
"lowercase_stop_analyzer": {
"tokenizer": "lowercase",
"filter": [
"french_stop_filter"
]
},
"lowercase_asciifolding": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase"
]
},
"french_analyzer_custom": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"french_elision",
"french_stemmer"
]
},
"custom_organic_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"asciifolding",
"lowercase",
"french_elision",
"organic-dictionary",
"true_false_filter",
"unique"
]
}
}
}
},
"mappings": {
"properties": {
"attr": {
"type": "text",
"analyzer": "french_analyzer_custom"
},
"brand_name": {
"type": "keyword"
},
"brand_name_suggest": {
"type": "completion",
"analyzer": "lowercase_stop_analyzer",
"search_analyzer": "lowercase_asciifolding",
"preserve_separators": false,
"preserve_position_increments": false,
"max_input_length": 50
}
}
}
}
Then I put a document in the index:
POST /movies/_doc/1001
{
"brand_name": "A LE MOUTON HUILE D'OLIVE",
"brand_name_suggest": [
"A LE MOUTON HUILE D'OLIVE"
]
}
Then my search :
GET movies/_search
{
"explain": true,
"suggest": {
"completer": {
"text": "amo",
"completion": {
"field": "brand_name_suggest",
"size": 20,
"skip_duplicates": true
}
}
}
}
My issue : why is this document found while searching for "amo"?
And how to prevent it to be returned ?
Thanks in advance
Since the
brand_name_suggestuses thelowercase_stop_analyzerwhich removes French stop words,A LE MOUTON HUILE D'OLIVEwould be analyzed asa, mouton, huile, olive, i.e.LEis getting removed.So at search time, when you type
amo, it matches the first two tokens, hence why you're getting this document. If you want to prevent this, you need to remove thefrench_stop_filterfrom your index-time analyzer.Besides another issue that might come to bug you later is that your search analyzer
lowercase_asciifoldingdoes asciifolding but your index-time analyzer doesn't, so if you index words with accent, you might not find them at search time either.