How to compare values between columns in 2 DataFrames in Python Pandas?

51 Views Asked by At

I have 2 DataFrames in Python Pandas like below:

Input:

df1 = pd.DataFrame({"col1":["APPLE", "BANANA", "ORANGE"]})
df2 = pd.DataFrame({"col_x":["APPLEXX", "BANANA", "CARROT"]})

df1:

col1
------
APPLE
ORANGE
BANANA

df2:

col_x
--------
APPLEXX
BANANA
CARROT

Requirements:

And I need to print only rows from df2 (col_x), which: contain values from df1 (col1) as part or all of the value in df2 (col_x) and their equivalent in df1 (col1)

Desire output:

So, as an output I need something like below:

col1 col_x
APPLE APPLEXX
BANANA BANANA

How can I do that in Python Pandas ?

1

There are 1 best solutions below

3
RomanPerekhrest On

You can take advantage of assignment expressions (since python 3.8) and builtin next method on generator expression:

df = pd.DataFrame([[s, w] for w in df2['col_x']
                   if (s := next((x for x in df1['col1'] if x in w), None))])
print(df)

        0        1
0   APPLE  APPLEXX
1  BANANA   BANANA

For older python versions use the following generator function:

def match_gen(s1, s2):
    for word in s1:
        match = next((x for x in s2 if x in word), None)
        if match:
            yield (match, word)

df = pd.DataFrame(match_gen(df2['col_x'], df1['col1']))