I am trying to perform multiple language detections on fields in document submitted to solr. This is my solrconfigxml for language identification:
<updateRequestProcessorChain name="langid">
<processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
<str name="langid.fl">title,description</str>
<str name="langid.langField">language</str>
<str name="langid.langsField">languages</str>
<bool name="langid.map">true</bool>
<bool name="langid.map.keepOrig">true</bool>
<str name="langid.whitelist">cjk,ckb,ar,bg,ca,cz,da,de,el,en,es,et,eu,fa,fi,fr,ga,gl,hi,hu,hy,id,it,ja,ko,lv,nl,no,pt,ro,ru,sv,tr</str>
<str name="langid.fallback">tg</str>
<bool name="langid.map.individual">true</bool>
<str name="langid.map.individual.fl">websiteKeywords,websiteDescription,websiteTitle,websiteContent</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I detect the language of the document based on title and description field. That part works as expected as I solr creates 2 new fiels title_languagecode and description_languagecode, for example title_en and description_en. I also wanted to separately detect the language of websiteKeywords, websiteDescription, websiteTitle, websiteContent and map the fields to websiteKeywords_en, websiteDescription_en, websiteTitle_en, websiteContent_en. With current configuration, that does not happen. Solr only stores original fields, that is websiteKeywords, websiteDescription, websiteTitle, websiteContent. What am I doing wrong? Your help is much appreciated. I am using solr 8.9.0.
I misunderstood what
langid.map.individualandlangid.map.individual.flrepresent.langid.map.individual.flis a subset oflangid.fl. If the specified fields are not inlangid.flit will have no effect.langid.map.individualrefers to fields inlangid.flor the subset specified inlangid.map.individual.fl.