Is it possible to use Whoosh to search for documents that do not exactly match the query, but are very close to it? For example, only one word is missing in the query to find something.
I wrote a simple code that works if the query covers all documents:
import os.path
from whoosh.fields import Schema, TEXT
from whoosh.index import create_in, open_dir
from whoosh.qparser import QueryParser
if not os.path.exists("index"):
os.mkdir("index")
schema = Schema(title=TEXT(stored=True))
ix = create_in("index", schema)
ix = open_dir("index")
writer = ix.writer()
writer.add_document(title=u'TV Ultra HD')
writer.add_document(title=u'TV HD')
writer.add_document(title=u'TV 4K Ultra HD')
writer.commit()
with ix.searcher() as searcher:
parser = QueryParser('title', ix.schema)
myquery = parser.parse(u'TV HD')
results = searcher.search(myquery)
for result in results:
print(result)
Unfortunately, if I change the query to one of the queries below, I won't be able to find all 3 documents (or find none at all):
myquery = parser.parse(u'TV Ultra HD') # 2 Hits
myquery = parser.parse(u'TV 4K Ultra HD') # 1 Hit
myquery = parser.parse(u'TV HD 2022') # 0 Hit
Is it possible to create a parse so that any of these queries still return 3 documents even if the title field is slightly different?
After some thought, I came to the usual enumeration of all combinations of words.
I added a variable
tolerance- this is the maximum number of words that can be cut from the original request. Also added a separate methodgetResults(words, tolerance).The final code is:
The result is 3 Hits:
But I consider this a bad decision, because it seems to me that in Whoosh this can be implemented much more concisely