I would like to scan a folder to pick up all the files end with '.txt' and then create a data frame by creating a new column for categorization with similar file names (partial score of ratio >=80)
import os
path = '../../../files'
text_files = [f for f in os.listdir(path) if f.endswith('.txt')]
text_files
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
s1 = "programmi.txt"
s2 = "programmi-2.txt"
fuzz.ratio(s1, s2)
The result I expect to see is like below:

Here's a solution which uses two for loops to compare each text to all the others to obtain the fuzz ratio needed for the categorisations.
Result:
Warning:
Please note that this approach has an order-dependency: In the example below, comparing
dict_cl.txtto the other names only leads to one match, while comparingdict_class12.txtto all other names leads to 3 matches. For your use case, where we assume that each group is very distinct from each other, this should not be a problem. However, this example shows that pairwise comparisons are a bit tricky in more sophisticated situations.