Solrcloud /select returns the different result than the documents that have been processed

41 Views Asked by At

abnormal behavior when running solrcloud: Problem: DIH says document processed is x: but in query it always less than x(generally returns the x-1 or x-2).

  • solr-9.0.0
  • openjdk-11.0.2
  • heap-memory: default

reproduce the problem:

  1. start solrcloud with altleast 3 instance(I have used 4 instance)
  2. 3 Zookeeper instance
  3. configured DIH from sql server. => here sql table consists the path of documents of physical location from where DIH will import the file path and by using transformer class it reads the file and send it to solr. => documents are txt file only

okay so in a collection:

  1. with 1 shard
  2. 2 replication factor (1 NRT and 1 TLOG)

so in this case 2 instance would be having a leader replica and 2 would be having a non-leader replica.

Documents: around 100k (most of the documents size are < 10kb) and around 100 documents are between 10 and 60 mb.

  1. start indexing. (in any node (using solrj or by admin UI)).
  2. so in between consider the scenario where multiple nodes crashes or restarted. in 4 solr instance cluster most probably 2. => this 2 restarted instance will not be in same shard so indexing could be continued. and also not the same node where indexing is started. => so the motive is to down the leader replica node to check fault tolerance.

Here what is the problem:

  1. when indexing is completed it returns the status: fetched: X, processed: Y lets say that in 100k documents fetched: 100k processed: 100k. => in /select * query it returns the 1 document less. e.g. total number of found documents: 99,999. so to cross check i got all the ids of documents and compared it with the /select query resultSet and got 1 id that is not indexed. => I've tried it multiple times and every time it miss to indexed random number of id even it says it's processed but not in actual.

  2. when leader replica node goes down electing other replica as leader takes too much time.

and no mention is made in solr.log file about that document.

here what i think is when tlog file is being written and at the same time if same node goes down the document couldn't be processed and the document is not even written in tlog file. let me know if I'm missing something.

all the configuration is same as default one: expect just added DIH manually in solr-9.0 running in jetty server.

0

There are 0 best solutions below