Turkish character problem in elasticsearch

729 Views Asked by At

When I search with Turkish characters in elasticsearch, it does not match. For example, when I type "yazilim", the result comes, but when I type "Yazılım", no result. The correct one is "Yazılım".

My index code.

 var createIndexDescriptor = new CreateIndexDescriptor(INDEX_NAME).Mappings(ms => ms.Map<T>(m => m.AutoMap()
                  .Properties(pprops => pprops
                      .Text(ps => ps
                          .Name("Title")
                          .Fielddata(true)
                          .Fields(f => f
                              .Keyword(k => k
                                  .Name("keyword")))))
      )).Settings(st => st
          .Analysis(an => an
              .Analyzers(anz => anz
                  .Custom("tab_delim_analyzer", td => td
                      .Filters("lowercase", "asciifolding")
                      .Tokenizer("standard")
                  )
              )
          )
      );

my search query code.

var searchResponse = eClient.Search<GlobalCompany>(s => s.Index(INDEX_NAME).From(0).Size(10)
                  .Query(q => q
                  .MultiMatch(m => m
                            .Fields(f => f
                             .Field(u => u.Title)
                             .Field(u => u.RegisterNumber))
                            .Type(TextQueryType.PhrasePrefix)
                          .Query(value))));
2

There are 2 best solutions below

1
codeguy On

You are using an asciifolding filter, it makes sure ASCII characters are used (see docs).

6
dadoonet On

You need to configure your field Title as a text field instead of a keyword field and set the analyzer for this field to tab_delim_analyzer.

I don't know how to translate this in dotNet world but here is what I mean in pure Kibana Dev Console script (curl):

DELETE deneme
PUT deneme
{
  "settings": {
    "analysis": {
      "analyzer": {
        "tab_delim_analyzer": {
          "type": "custom", 
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "Title": {
        "type": "text",
        "analyzer": "tab_delim_analyzer"
      }
    }
  }
}