I try to filter Pandas DataFrame:
df = pd.read_csv('ml_data.csv', dtype=str)
def df_filter(df):
#df = df.copy()
df.replace('(not set)', '(none)', inplace=True) #comment this and warning will disappear!!!
df = df[df['device_browser'] != '(none)'] #comment this and warning will disappear!!!
def browser_filter(s):
return ''.join([c for c in s if c.isalpha()])
df['device_browser'] = df['device_browser'].apply(browser_filter)
return df
df = df_filter(df)
And I receive this warning:
/tmp/ipykernel_2185/1710484338.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['device_browser'] = df['device_browser'].apply(browser_filter)
But if I uncomment
#df = df.copy()
OR comment
df.replace('(not set)', '(none)', inplace=True)
OR comment
df = df[df['device_browser'] != '(none)']
OR will not wrap filtering in df_filter function
this warning will disappear!!! WHY??????????
I danced around the fire and beat the tambourine...
Because by doing
df.copy()
you create a deep copy of our dataframe, you can see that in the documentation,deep = True
by default.So if you create a deep copy of your base dataframe, the warning will disappear.
But, if you don't, you will create shallow copy using:
df.replace('(not set)', '(none)', inplace=True)
.And after you try to filter a shallow copy using
df = df[df['device_browser'] != '(none)']
, that why you have this warning. So if you remove one the two lines, it is logic that you don't have the warning.I invite you to check the difference between shallow and deep copy on this stackoverflow question.