find anagram phase in dictionary

25 Views Asked by At

I want to find the anagram phase from the dictionary which checks it against the word mentioned by the users. A few examples, have shared ::

Example:

i/p --> Barbara Bush, o/p {"abash", "bar", "rub"}

i/p --> Clint Eastwood, o/p {"old", "west", "action"}

1

There are 1 best solutions below

0
Stef On

Assuming you already have a list all_words of all English words:

  • Building the anagram datastructure: First build a hashmap that maps (multi)sets of letters to the list of all anagrams (I mean exact anagrams, not sub-anagrams like in your example);
  • Then for every input string: iterate on all submultisets of letters of your input string, and pull all corresponding entries from the hashmap.

Below in python.

Step 1: Building the anagram datastructure

all_words = [
    'aa', 'aah', 'aahed', 'aahing', 'aahs',
    'aal', 'aalii', 'aaliis', 'aals', 'aardvark',
    ...,
    'zymotically', 'zymotics', 'zymurgies', 'zymurgy', 'zythum',
    'zythums', 'zyzzyva', 'zyzzyvas', 'zzz', 'zzzs'
]

def build_anagram_dict(all_words):
    anagram_dict = {}
    for word in all_words:
        anagram_dict.setdefault(''.join(sorted(word)), []).append(word)
    return anagram_dict

anagram_dict = build_anagram_dict(all_words)
#   {...,
#   aeinrstu: ['ruinates', 'taurines', 'uranites', 'urinates'],
#   ekrsy: ['rykes', 'skyer', 'skyre', 'syker', 'yerks'],
#   orty: ['ryot', 'tory', 'troy', 'tyro'],
#   aeilnssst: ['saintless', 'saltiness', 'slatiness', 'stainless'],
#   aelrssv: ['salvers', 'servals', 'slavers', 'versals'],
#   aelssv: ['salves', 'selvas', 'slaves', 'valses'],
#   aenss: ['sanes', 'seans', 'senas', 'sensa'],
#   alnost: ['santol', 'stanol', 'talons', 'tolans'],
#   deenrsstu: ['sederunts', 'undersets', 'undeserts', 'untressed'],
#   ...}

Step 2: Searching for anagrams of submultisets

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

def anagrams(input_word, anagram_dict):
    yield from chain.from_iterable(
        anagram_dict.get(''.join(k), ())
        for k in set(powerset(sorted(input_word.strip().lower())))
    )

Testing:

clint = list(anagrams('clinteastwood', anagram_dict))
barb = list(anagrams('barbarabush', anagram_dict))

print(len(clint))
# 2661

print(clint[:10] + ['...'] + clint[-10:])
#['soote', 'li', 'dwile', 'wield', 'wiled', 'eidolon', 'oceloid', 'cine', 'nice', 'toit', '...', 'wasted', 'lost', 'lots', 'slot', 'dawns', 'wands', 'do', 'od', 'scailed', 'tilt']

print(len(barb))
# 132

print(barb[:10] + ['...'] + barb[-10:])
# ['abba', 'baba', 'sau', 'bubs', 'aua', 'bahus', 'habus', 'subah', 'subha', 'baur', '...', 'auras', 'hurras', 'aura', 'bah', 'rah', 'bash', 'ur', 'arb', 'bar', 'bra']