I have this code:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4]})
df2 = df[df['a'] > 2]
df2['b'] = df2['a'] * 2
This code raises a SettingWithCopyWarning
. The warning is a false positive because I've assigned the result of the slice operation to another dataframe, and I intentionally only want to modify df2
and not df
.
To avoid this warning, I usually call .copy()
:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4]})
df2 = df[df['a'] > 2].copy()
df2['b'] = df2['a'] * 2
However, this is inefficient; df2
is already a copy and not a view, so there's no point in creating another copy.
A more efficient way to do this is to set the _is_copy
attribute:
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4]})
df2 = df[df['a'] > 2]
df2._is_copy = None
df2['b'] = df2['a'] * 2
However, this relies on the private attributes of the dataframe and isn't future-proof.
How can I reset the _is_copy
attribute without making doing a heavyweight copy operation?
I'm using Pandas 1.5.3 and cannot easily switch to Pandas 2.0 or above.
You can use method chaining.
Code:
Output: