Best method in Python to find a longest string subset against a list of multiple options per character

177 Views Asked by kedingt At 06 July 2021 at 15:49

I have a simple string, and a list of sets, where each set is a position with 2 possible characters, which looks something like:

"AGTCG"

[('A', 'T'), ('C', 'B'), ('G', 'T'), ('T', 'X'), ... ]

Where I want to find the longest match. In this example it would be "TCG". Each set will never have more than 2 characters. The best solution I have come up with is generating every single possible string using the combinations of characters (ACGT..., ACGX..., ACTT..., etc.), then using difflab SequenceMatcher.find_longest_match and finding the largest results. I suspect there's a better method, but struggling to find other options. Is there a better way?

Original Q&A

There are 1 best solutions below

kedingt On 06 July 2021 at 18:06

Kind of resolved. Not an exact solution, but rather a solution to my larger goal, which allows be to bypass this step. The goal was to produce an alignment, so if the set only had one character per position such as ACGTACGT, the string 'AGTCG' would align as best as possible to produce A-GT-CG-. After reflecting, rather than looking for the largest match in a recursive algorithm, the best method would be to utilize a modified version of the Smith-Waterman algorithm.

https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm

Best method in Python to find a longest string subset against a list of multiple options per character

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in STRING-COMPARISON

Related Questions in SEQUENCEMATCHER

Trending Questions

Popular # Hahtags

Popular Questions