I am working on a word puzzle game solver that needs to determine whether or not some candidate word is valid. I am storing all of the valid words in a trie data structure, after they have been loaded from a text file. Building this trie (adding around 370,000 words to the trie) takes around 5 seconds on my computer.
The main part of my program takes an image of a puzzle (a codeword puzzle, to be specific), interprets the puzzle and solves it using a depth-first search. The problem I have right now is that building the trie takes 5 seconds and when a user wants to solve a new image, the trie is rebuilt each time, even though it never changes. The rest of the process, from loading the puzzle image to having a finished result, takes around 1 second, so it is unacceptable to wait 5 seconds to build a trie every time.
I am currently getting around this by partitioning my main script into sections to isolate the trie building code. This is done using #%% in the Spyder IDE, like you can do in Matlab, or having a new cell in a Jupyter notebook.
Here is how I have things structured right now:
# codeword_solver_trie.py
class TrieNode:
def __init__(self):
self.children = {}
self.is_word = False
class WordDictionary:
def __init__(self):
self.root = TrieNode()
def addWord(self, word):
current_node = self.root
for character in word:
current_node = current_node.children.setdefault(character, TrieNode())
current_node.is_word = True
def search(self, word):
def dfs(node, index):
if index == len(word):
return node.is_word
if word[index] == ".":
for child in node.children.values():
if dfs(child, index+1):
return True
if word[index] in node.children:
return dfs(node.children[word[index]], index+1)
return False
return dfs(self.root, 0)
I have another file where the main processing occurs. It contains this function:
def build_word_dictionary_trie(words):
print("Building trie...")
word_trie = WordDictionary()
for word in words:
word_trie.addWord(word)
return word_trie
The relevant part of the main script is as follows:
word_list_fpath = "data/words_alpha.txt"
all_words = load_all_words(word_list_fpath)
# Build the trie from the word list
word_trie = build_word_dictionary_trie(all_words)
#%% (this separates the execution of the code above from the code below)
# Load image from file and convert from BGR to RGB
original_img = cv2.imread(f"data/codeword_images/{img_num}.jpg")
# ...
How can I build this trie exactly once and have the user be able to re-run the processing an arbitrary number of times without resorting to this hack of sectioning the code with #%%?