I am planning to use Solr as a quasi-database to provide fast search functionality for text data. The maximum expected number of records is around 1 million, ~50 KB each. Additionally, a relatively large number of user-defined custom fields are expected, necessitating frequent reindexing.
Of course, one cannot simply run something like:
response = solr.search(q="*:*")
solr.delete(q="*:*")
solr.commit()
all_data = [doc for doc in response]
for doc in all_data:
solr.add(doc)
This raises the question: what are common reindexing strategies using Python, and how much time would they take to execute given the aforementioned data volume? Can I create a new core with the same schema, and delete the old core?