Use substring defined by regex for indexing in solr

23 Views Asked by At

I'm trying to manipulate incoming string values for indexing using a regex expression.

Examples IN

  • stack\\overflow@exchange
  • hello\\world@all
  • foo\\bar@domain

Required OUT

  • overflow@exchange
  • world@all
  • bar@domain

As you can see, I would like to split the string using the \\ characters. For indexing my values, I've defined the field:

<field name="_o_" type="string_t_owner" indexed="true" stored="true" multiValued="false" required="false"/>

And the corresponding field-type

<fieldType name="string_t_owner" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(^(.*[\\\/]))" replacement=""/>/>
            <tokenizer class="solr.StandardTokenizerFactory"/>
        </analyzer>
    </fieldType>

Following procedure:

  1. Search for the desired substring
  2. Replace it by nothing
  3. Tokenizer (or class) is mandatory.

I've also tested the regex and at least using the RegEx-Tester it looks good: https://regex101.com/r/1Qtrol/2

Unfortunately, reindexing my documents results does not return the needed output. Values are not changed, no error message at all.

Based on @MatsLindh hint some additional information: Putting the following document in the index

{
    "_o_":"stack\\overflow@exchange",
    "description":"Testing Indexing",
    "id":"1234"
}

I get the following indexed document:

{
    "_o_":"stack\\overflow@exchange",
    "description":"Testing Indexing",
    "id":"1234"
}

Although through the Analysis page it looks like there's some magic around, but for any reason the desired manipulated information is not stored: Analysis Results

Is there anything I've missed for actual storing the manipulated field?

0

There are 0 best solutions below