I'm looking for the most efficient way of solving an Levenshtein edit distance problem.
We are given as input:
- A set of strings
Sof sizen<= 8, with average lengthm<= 50- A target string
tof lengthl<= 50
Our task is to 'align' t with S i.e:
Find the subset
s*ofS, whereconcat(s*)has the minimum Levenshtein edit distance (among all the subsets) witht
Some thoughts:
- For 2 strings of length
l1andl2, we could use the standard dynamic programming algorithm which hasO(l1*l2)time complexity - Brute forcing this would require us to compute
editDistance(t, concat(s')) for each subset s'. This would be approximatelyO(l.m.n!) - This could be optimized a bit by memoizing the results i.e if we are computing
editDistance(t, S[1, 2, 3, 4])we could re-use the computation fromS[1, 2, 3]) - The other option I could think of is to construct a Trie or DAWG (Directed Acyclic Word Graph)
But, I'm not an expert on this, so I might be missing a better solution. How would we do this efficiently?
Thanks in advance!
Outline of a memoize solution. (I just didn't do levenshtein for you.)