I'm trying to merge two Solr core indexes into new one using org/apache/lucene/misc/IndexMergeTool.
All indexes are saved on HDFS under path /apps/solr/data/collection_name/data/index.
So I've created a new collection say col_new, and I'm trying to merge there col_1: core_1 and core_2.
I'm using is the following:
""" java -cp /usr/cloudera-hdp-solr/5.0.0.5-301/cloudera-hdp-solr/solr/server/solr-webapp/webapp/WEB-INF/lib/lucene-core-7.4.0.jar:/usr/cloudera-hdp-solr/5.0.0.5-301/cloudera-hdp-solr/solr/server/solr-webapp/webapp/WEB-INF/lib/lucene-misc-7.4.0.jar org/apache/lucene/misc/IndexMergeTool -destDir hdfs://namenode/path_to_new_core/data/index -srcDir hdfs://namenode/path_to_old_core_1/data/index hdfs://namenode/path_to_old_core_2/data/index """
The behaviour is strange. It creates a folder named hdfs: and other two named -srcDir and -destDir.
Have someone experience in merging indexes saved on a shared file system?
Other details:
- Solr version 7.4
- HDP v3
- Lucene 5.0.0
Thanks.
The problem may be in the directory type that IndexMergeTool uses to read and write index files. I am not sure about all versions, but the last version uses FSDirectory to access the files.
FSDirectoryhas a few implementations, but all of them work with local file systems, not with HDFS. To access HDFS, it should useHdfsDirectory.It looks like
IndexMergeToolcan't help you with merge files stored on HDFS, but you can implement your own merger usingHDFSDirectory:It is also important to note that writing directly to HDFS may produce performance problems if you have a lot of files in your indexes. Sometimes it may be fast to merge indexes locally and then copy to HDFS the result index.