How do I get empty fields in SOLR indexed for a schemaless collection?

372 Views Asked by At

How do I get empty fields in SOLR indexed? I am using solr 7.2.0

I am using schemaless SOLR to try to index everything as string, but for files with empty fields, those fields do not get indexed. Is there a way to get them to show up?

col1,col2,col3
a,,1
d,e,
g,h,3

for example column 1 shows up as

{
"col1":"a",
"col3":"1",
}

I'm trying to also get col2 to show up. in my solrconfig.xml i have this

  <dynamicField name="*" type="text_general" indexed="true" stored="true" required="true" default="" />

and I have any traces of the remove-blank processor removed from my config. I've reloaded and deleted/recreated by collection multiple times. Is there a solution for this?

2

There are 2 best solutions below

2
MatsLindh On BEST ANSWER

The CSV import module has its own option to keep empty fields - f.<field name>.keepEmpty=true.

If you don't give that option, the CSV handler will never give the empty field value to the next step in your indexing process.

Giving f.col2.keepEmpty=True as an URL argument should at least give you a better starting point.

0
Persimmonium On

maybe preprocess your csv file like this:

s/,,/, ,/g

That is, add an space between both commas (you will have to specially deal with the last value differntly though, there is a regex for that).

And then try again. Right now solr is reading the value as non existant, making it a space has more chances to make it through, and would not change search results (if you don't have some crazy analysis chains)