Adding a new column to a df is not carrying over to a shallow copy of the df

54 Views Asked by At

I have a very simple df, which I'm going to make a shallow copy of.

# assign dataframe
old_df = pd.DataFrame({'values': [10, 20, 30, 40]})

# shallow copy
copy_df = old_df.copy(deep=False)

I understand that in a shallow copy, the changes made to one should carry over to the original. So if I create a new column (change) to the copy_df, I'd expect the change to be made to the old_df as well.

I tried creating a new column in two methods.

# method 1
copy_df.loc[:, 'new_col'] = [0, 0, 0, 0]
# method 2 
copy_df['new_col'] = [0, 0, 0, 0]

My expected result is as follows:

>>> old_df
   values  new_col
0      10        0
1      20        0
2      30        0
3      40        0

But what I get, from both methods, is the original, unchanged df:

>>> old_df
   values
0      10
1      20
2      30
3      40

I would like to ask why the change I made to the shallow copy is not carrying over to the original.

2

There are 2 best solutions below

0
henrylin03 On

This is the expected behaviour now after pandas v1.4: https://github.com/pandas-dev/pandas/issues/47703

1
Corralien On

I understand that in a shallow copy, the changes made to one should carry over to the original. So if I create a new column (change) to the copy_df, I'd expect the change to be made to the old_df as well.

Yes, this is true for all existing columns (Series) before copying but if you create new columns, they will be added only on the current DataFrame because both share the reference to the existing columns.

# Create new column
copy_df.loc[:, 'new_col'] = [0, 0, 0, 0]  # or copy_df['new_col'] = [0, 0, 0, 0]
print(old_df)

# Output
   values
0      10
1      20
2      30
3      40
# Modify existing column
copy_df.loc[[1, 2], 'values'] = 0
print(old_df)

# Output
   values
0      10
1       0
2       0
3      40