The rule of the output columns in pandas.merge()

45 Views Asked by At

When I use pandas.merge, there's a strange output and I can't understand the real rules of output columns after calling merge().

The source code(pandas is v1.5.2):

import pandas as pd
import numpy as np

s1 = pd.Series([100, 200, 300], index=[4,5,6], name="A")
df2 = pd.DataFrame([[100,1,1,1],[200,2,2,2],[300,3,3,3]], index=[4,5,6],columns=list("ABCD"))
df3 = pd.merge(
    s1,
    df2,
    suffixes=("_left", "_right"),
    how="left",
    left_index=True,
    right_on="A",
)
print(df3)

The output is:

See the image

     A  A_left  A_right   B   C   D
NaN  4     100      NaN NaN NaN NaN
NaN  5     200      NaN NaN NaN NaN
NaN  6     300      NaN NaN NaN NaN

So here is the question, why the second column (labeled with A) has the values of the input series' index, and A is converted to a column, not index?

Further more, can anyone give a more detailed explanation of the rule of the output columns of the merge API?

I have read the docs but it's a little brief.

1

There are 1 best solutions below

2
Corralien On

I think you misunderstood the name of a Series:

>>> s1
4    100
5    200
6    300
Name: A, dtype: int64
#     ^--- A is the name of the values not the index

>>> s1.to_frame()
     A  # it becomes the name of the column when your convert it as DataFrame
4  100
5  200
6  300

>>> s1.rename_axis('MyIndex')
MyIndex  # now the index has a name
4    100
5    200
6    300
Name: A, dtype: int64

In your code you try to find a relation between the index of s1 (which has no name) and the 'A' column of df2. So you try to match [4, 5, 6] with [100, 200, 300] which doesn't make sense.

However, what you should do is:

>>> pd.merge(s1, df2, on='A', how='left', suffixes=("_left", "_right"))
     A  B  C  D
0  100  1  1  1
1  200  2  2  2
2  300  3  3  3

If you want to join by index, you have to use:

>>> pd.merge(s1, df2, left_index=True, right_index=True, how='left', suffixes=("_left", "_right"))
   A_left  A_right  B  C  D
4     100      100  1  1  1
5     200      200  2  2  2
6     300      300  3  3  3