I'm trying to manipulate incoming string values for indexing using a regex expression.
Examples IN
stack\\overflow@exchangehello\\world@allfoo\\bar@domain
Required OUT
overflow@exchangeworld@allbar@domain
As you can see, I would like to split the string using the \\ characters.
For indexing my values, I've defined the field:
<field name="_o_" type="string_t_owner" indexed="true" stored="true" multiValued="false" required="false"/>
And the corresponding field-type
<fieldType name="string_t_owner" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(^(.*[\\\/]))" replacement=""/>/>
<tokenizer class="solr.StandardTokenizerFactory"/>
</analyzer>
</fieldType>
Following procedure:
- Search for the desired substring
- Replace it by nothing
- Tokenizer (or class) is mandatory.
I've also tested the regex and at least using the RegEx-Tester it looks good: https://regex101.com/r/1Qtrol/2
Unfortunately, reindexing my documents results does not return the needed output. Values are not changed, no error message at all.
Based on @MatsLindh hint some additional information: Putting the following document in the index
{
"_o_":"stack\\overflow@exchange",
"description":"Testing Indexing",
"id":"1234"
}
I get the following indexed document:
{
"_o_":"stack\\overflow@exchange",
"description":"Testing Indexing",
"id":"1234"
}
Although through the Analysis page it looks like there's some magic around, but for any reason the desired manipulated information is not stored:

Is there anything I've missed for actual storing the manipulated field?