What distance measure can I use that factors in order?

81 Views Asked by At

I have a few lists that havethe same IDs that are strings. They are as follows:

list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "1"]

What measure can I use to determine the lists that are closest to each other here in terms of order? Ideally list1 and list2 should be the closest here.

Does the spearman correlation make sense here?

2

There are 2 best solutions below

0
wu1meng2 On

Edit distance seems to be a good candidate for such a metric.

from typing import List


def calcEditDistance(lhs: List[str], rhs: List[str]) -> int:
    '''
    Dynamic programming
    dp[i][j] = minimum number of operations to convert lhs[0:i] to rhs[0:j]
    '''
    m = len(lhs)
    n = len(rhs)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    for i in range(1, m + 1):
        dp[i][0] = i

    for j in range(1, n + 1):
        dp[0][j] = j

    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if lhs[i - 1] == rhs[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]
            else:
                dp[i][j] = min(dp[i - 1][j - 1], dp[i - 1]
                               [j], dp[i][j - 1]) + 1

    return dp[m][n]


list1 = ["1", "2", "3", "4", "5"]
list2 = ["1", "2", "3", "5", "4"]
list3 = ["1", "5", "4", "3", "2"]
list4 = ["4", "2", "5", "3", "2"]

res = calcEditDistance(list1, list2)
print(f"dis[1, 2] = {res}")

res = calcEditDistance(list1, list3)
print(f"dis[1, 3] = {res}")

res = calcEditDistance(list1, list4)
print(f"dis[1, 4] = {res}")

res = calcEditDistance(list2, list3)
print(f"dis[2, 3] = {res}")

res = calcEditDistance(list2, list4)
print(f"dis[2, 4] = {res}")

res = calcEditDistance(list3, list4)
print(f"dis[3, 4] = {res}")

prints

dis[1, 2] = 2
dis[1, 3] = 4
dis[1, 4] = 4
dis[2, 3] = 4
dis[2, 4] = 4
dis[3, 4] = 3

which matches your intuition.

Note that in the Python code I use the Levenshtein distance where insert, delete, and replace operations are allowed. You can, of course, use other types of edit distance.

0
user2357112 On

The comments have clarified that you're looking for something where two lists are closer together the more elements they have in the same positions. In that case, just count how many elements they have in different positions:

def distance(l1, l2):
    return sum(1 for i, j in zip(l1, l2) if i != j)