I'm trying to create a code to find the optimal solution for scrabble. And i need help with generating all of the word combinations with a set of letters. (To clarify: i do not need help validating if the words are in a dictionary or not. I just want to generate all of the different combinations). Anyway, in scrabble there is a blank square/wild card which can be any of the valid letters, in my python code it is represented by '_'. Since the wild cards value is zero its important to keep track of what words originates from a wild card, therefore all wildcard words in my code is uppercase. It is also imported to keep in mind that the function does not return a list with any words that have a uppercase in it that can be also written in lowercase with the same set of letters, since i only want to keep highest scoring word combinations. Also keep in mind that it is possible to get multiple wild cards in scrabble (Though unlikely).
This is my fastest version of the code so far. But still in a worst case scenario with 5 letters and 2 wild cards it takes over a minute to run. Most of this time is spend removing the uppercase duplicates, so i have focused my optimization effort on that part. So far I have added numba, binary search (to speed up the processes of checking if the lowercase word already exists) and list comprehension but i still only got none to minor improvement. I think the only possible way to speed up my code is to use a another algorithm.
import numpy as np
import itertools, numba, time
def find_all_word_combinations(letters, wild_card='_', valid_letters='abcdefghijklmnopqrstuvwxyzæøå'):
valid_letters = valid_letters.upper()
return remove_uppercase_duplicates([
''.join(permutation) for replaced_words in replace_wild_card_with_letters(letters, wild_card, valid_letters)
for length in range(len(replaced_words))
for combination in itertools.combinations(replaced_words, length + 1)
for permutation in itertools.permutations(combination)])
def replace_wild_card_with_letters(word, wild_card='_', valid_letters='abcdefghijklmnopqrstuvwxyzæøå'):
if wild_card not in word:
return [word]
valid_letters = valid_letters.upper()
where = [i for i, x in enumerate(word) if wild_card == x]
blanks_filled = [''.join(perm) for perm in itertools.combinations_with_replacement(valid_letters, word.count(wild_card))]
result = []
for letters in blanks_filled:
new_word = list(word)
for index, letter in enumerate(letters):
new_word[where[index]] = letter
result.append(''.join(new_word))
return result
@numba.njit(fastmath=True)
def binary_search_check(arr:list, item:str)->bool:
left = 0
right = len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == item:
return True
elif arr[mid] > item:
right = mid - 1
else:
left = mid + 1
return False
@numba.njit(fastmath=True)
def remove_uppercase_duplicates(words):
words = np.sort(words)
return [word for word in words if not binary_search_check(words, word.lower())]
start = time.perf_counter()
print(len(find_all_word_combinations('abcde__')))
end = time.perf_counter()
print(f'Finished in {end-start} seconds.')
Edit: The code i first provided is wrong, the remove_uppercase_duplicates function does not function properly. The function is supposed to delete all duplicates and remove all uppercase words that have a lower case duplicate. Since someone has already posted a better answer i'm not going to fix the original code (i am also to lazy ¯\(ツ)/¯). Feel free to fix my error and post it. I also fixed some grammar.
A couple strategies might help here. Two goals should stand out real quick:
You really really want to avoid sorting the result, which is a huge overhead, especially when following it up with a binary search executed many times
You really (just one "really") want to work with sets rather than lists here to avoid duplicates and make member checking rather instantaneous.
So here is a solution that does that. Note: There is probably a little bit left to do here if you really dig in on the wildcard replacement piece, but I'm not sure it is worth the effort. I didn't tweak your replacement function much. This computes all words in less than 2 secs.
Also note: I didn't QA this too much, you should probably do some set comparisons with the current method to ensure that the results match expectations.
Edit: 2 corrections. First, orig post had "permutations within the permutation" which is unnecessary and removed. Second, I neglected the fact that I reduced the size of
lettersby the number of wildcards, which affects the length limit. Both fixed, new total is ~2.3secCode:
Output (edited after code update):