When I use pandas.merge, there's a strange output and I can't understand the real rules of output columns after calling merge().
The source code(pandas is v1.5.2):
import pandas as pd
import numpy as np
s1 = pd.Series([100, 200, 300], index=[4,5,6], name="A")
df2 = pd.DataFrame([[100,1,1,1],[200,2,2,2],[300,3,3,3]], index=[4,5,6],columns=list("ABCD"))
df3 = pd.merge(
s1,
df2,
suffixes=("_left", "_right"),
how="left",
left_index=True,
right_on="A",
)
print(df3)
The output is:
A A_left A_right B C D
NaN 4 100 NaN NaN NaN NaN
NaN 5 200 NaN NaN NaN NaN
NaN 6 300 NaN NaN NaN NaN
So here is the question, why the second column (labeled with A) has the values of the input series' index, and A is converted to a column, not index?
Further more, can anyone give a more detailed explanation of the rule of the output columns of the merge API?
I have read the docs but it's a little brief.
I think you misunderstood the name of a
Series:In your code you try to find a relation between the index of
s1(which has no name) and the 'A' column ofdf2. So you try to match [4, 5, 6] with [100, 200, 300] which doesn't make sense.However, what you should do is:
If you want to join by index, you have to use: