I am a newbie to Solr and trying to find out what is the general solution being used for the multi-word synonym issue with highlighting:
1. When we search for a hair brush, the word toothbrush is also being highlighted as toothbrush is in the synonyms.txt file.
HAIR BRUSH,HAIRBRUSH,HAIR-BRUSH,HAIRBRUSHES,HAIR BRUSHES
TOOTH BRUSH,TOOTHBRUSH,TOOTH-BRUSH,TOOTHBRUSHES,TOOTH BRUSHES
- Could you please let me know if this is because the SynonymGraphFilterFactory is being used in both indexing-time and query-time?
- If not, what needs to be done so that the terms not matching the query term is not highlighted.
The schema.xml configuration for the fieldType is as follows:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[({.,\[\]/})]" replacement=" "/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" preserveOriginal="1" catenateAll="1" />
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[({.,\[\]/})]" replacement=" "/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
<filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" />
</analyzer>
</fieldType>
We are using Solr: 6.5.1