Setting the last n non NaN vale per group with nan

47 Views Asked by At

I have a DataFrame with (several) grouping variables and (several) value variables. My goal is to set the last n non nan values to nan. So let's take a simple example:

df = pd.DataFrame({'id':[1,1,1,2,2,],
                  'value':[1,2,np.nan, 9,8]})
df
Out[1]:
id  value
0   1   1.0
1   1   2.0
2   1   NaN
3   2   9.0
4   2   8.0

The desired result for n=1 would look like the following:

Out[53]:
id  value
0   1   1.0
1   1   NaN
2   1   NaN
3   2   9.0
4   2   NaN
3

There are 3 best solutions below

0
Quang Hoang On BEST ANSWER

Use with groupby().cumcount():

N=1
groups = df.loc[df['value'].notna()].groupby('id')
enum = groups.cumcount()
sizes = groups['value'].transform('size')

df['value'] = df['value'].where(enum < sizes - N)

Output:

   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN
0
mozway On

One option: create a reversed cumcount on the non-NA values:

N = 1
m = (df
 .loc[df['value'].notna()]
 .groupby('id')
 .cumcount(ascending=False)
 .lt(N)
)
df.loc[m[m].index, 'value'] = np.nan

Similar approach with boolean masking:

m = df['value'].notna()
df['value'] = df['value'].mask(m[::-1].groupby(df['id']).cumsum().le(N))

output:

   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN
1
BENY On

You can check cumsum after groupby get how many notna value per-row

df['value'].where(df['value'].notna().iloc[::-1].groupby(df['id']).cumsum()>1,inplace=True)
df
Out[86]: 
   id  value
0   1    1.0
1   1    NaN
2   1    NaN
3   2    9.0
4   2    NaN