Is a Generalized Suffix Tree a good data structure to use for string searches on a dict of strings where partial matches should also be returned?

29 Views Asked by zolo00 At 29 February 2024 at 22:56

I have a dictionary of strings that I would like to perform string searches on in real time (web application with approx. 1500 total users).

Background: I have a data table that follows the structure

C1	C2	C3	C4	C5	C6	C7-13
String	String	String	String	String	String	Bool

~5000 rows (n)
Longest String per column: 150 characters
~10000 Unique words
Alphabet: Printable ASCII characters
Each character in the alphabet is equally likely to appear in each string position.

Search Sudo Algorithm:

On app startup read in the table as a dict = {C1:[0,...n], C2[0,..., n], ...}. 'Data' argument in minimal ex.
User types a key (multi-word accepted) that is used to search through the dict values in columns C1-C6.
Return any rows that contains similar strings to the search.

Minimal Search Example:

from fuzzywuzzy import fuzz


def similarity_sort(key, data):
    result_rows = []
    temp = []

    for index in range(len(data['column 1'])):
        check_r = perform_check(key, "{} {} {} {} {} {}".format(data['column1'][index], ['column2'][index], data['column3'][index], data['column4'][index], data['column5'][index], data['column6'][index]))
        temp.append([index, check_r])

    for x in temp:
        if x[1] > 20:
            result_rows.append(x[0])
    result_rows.sort()
    return result_rows


def perform_check(term, row_value):
    check_ratio = fuzz.token_set_ratio(term.lower(), row_value.lower())
    return check_ratio

With Search Key: 'SQL'

Expected to match with column containing values such as:

SQL
MySQL
SQLdatabase
MySQLdatabase

My question is would a Generalized Suffix Tree be a good fit for my string search problem?. If not are there other data structures I should look into?

Original Q&A

Is a Generalized Suffix Tree a good data structure to use for string searches on a dict of strings where partial matches should also be returned?

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in DATA-STRUCTURES

Related Questions in STRING-MATCHING

Related Questions in SUFFIX-TREE

Trending Questions

Popular # Hahtags

Popular Questions