What distance measure can I use that factors in order?

81 Views Asked by Hummer At 11 September 2023 at 00:04

I have a few lists that havethe same IDs that are strings. They are as follows:

list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "1"]

What measure can I use to determine the lists that are closest to each other here in terms of order? Ideally list1 and list2 should be the closest here.

Does the spearman correlation make sense here?

Original Q&A

There are 2 best solutions below

wu1meng2 On 11 September 2023 at 00:27

Edit distance seems to be a good candidate for such a metric.

from typing import List


def calcEditDistance(lhs: List[str], rhs: List[str]) -> int:
    '''
    Dynamic programming
    dp[i][j] = minimum number of operations to convert lhs[0:i] to rhs[0:j]
    '''
    m = len(lhs)
    n = len(rhs)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        dp[i][0] = i

    for j in range(1, n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if lhs[i - 1] == rhs[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1]
                               [j], dp[i][j - 1]) + 1

    return dp[m][n]


list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "2"]

res = calcEditDistance(list1, list2)
print(f"dis[1, 2] = {res}")

res = calcEditDistance(list1, list3)
print(f"dis[1, 3] = {res}")

res = calcEditDistance(list1, list4)
print(f"dis[1, 4] = {res}")

res = calcEditDistance(list2, list3)
print(f"dis[2, 3] = {res}")

res = calcEditDistance(list2, list4)
print(f"dis[2, 4] = {res}")

res = calcEditDistance(list3, list4)
print(f"dis[3, 4] = {res}")

prints

dis[1, 2] = 2
dis[1, 3] = 4
dis[1, 4] = 4
dis[2, 3] = 4
dis[2, 4] = 4
dis[3, 4] = 3

which matches your intuition.

Note that in the Python code I use the Levenshtein distance where insert, delete, and replace operations are allowed. You can, of course, use other types of edit distance.

user2357112 On 11 September 2023 at 00:50

The comments have clarified that you're looking for something where two lists are closer together the more elements they have in the same positions. In that case, just count how many elements they have in different positions:

def distance(l1, l2):
    return sum(1 for i, j in zip(l1, l2) if i != j)

What distance measure can I use that factors in order?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in DISTANCE

Related Questions in LEVENSHTEIN-DISTANCE

Trending Questions

Popular # Hahtags

Popular Questions