I have a problem with Whoosh. I want to create an index in different moments, because the query to extract data is heavy. I fixed almost all the problems, but I can't get over the problem that every time I reopen the index to add new documents, the file is cleaned instead of simply adding new documents. I tried to use update_document instead of add_document, and FileStorage.open_index instead of index.open_dir, but nothing changed: I always had an index file much smaller than expected.
if is_new_index_file:
if os.path.isdir(<dirname>):
rmtree(<dirname>)
os.mkdir(<dirname>)
else:
os.mkdir(<dirname>)
schema = TranslationSchema()
index.create_in(<dirname>, <schema>, indexname=<indexname>)
ix = index.open_dir(<dirname>, indexname=<indexname>, schema=<schema>)
else:
#open an existing index object
# ix = index.open_dir(<dirname>, indexname=<indexname>)
# open file storage
ix = FileStorage(<dirname>)
ix.open_index(indexname = <indexname>)
...
list-of-fields = <query-to-the-database-to-extract-fields>
...
writer = ix.writer()
#writer.add_document(<list-of-fields>)
writer.update_document(<list-of-fields>)
writer.commit(merge=False, optimize=True)
ix.close()