How to generate a flag based on the status column of 12 consecutive period

87 Views Asked by At
ID  Period   status Flag (Output)
A   2020-10-28  0   1
A   2020-11-28  0   1
A   2020-12-28  0   1
A   2021-01-28  0   1
A   2021-02-28  0   1
A   2021-03-28  0   1
A   2021-04-28  0   1
A   2021-05-28  0   1
A   2021-06-28  1   0
A   2021-07-28  0   0
A   2021-08-28  0   0
A   2021-09-28  0   0
A   2021-10-28  0   0

Here is the simple example for generating flag based on the status column: example: for unique ID-A, to flag the first row I check the 'status' column for next 12 month if the status within next 12 months is 1, I will flag the O/P as 1. below is my sample code which gives me the desired result, however performance wise this is not efficient for a large set of data (have close to ~1 million data with different ID). Is there a way that this can be coded in a simpler and faster way or looping is the only option.

   df = df1[['ID','Period','status']]
   df = df.sort_values(by=['ID','Period']).reset_index(drop=True)
   df['flag']=0
   default=pd.DataFrame()

   j=df['ID'].unique()
   k=pd.DataFrame(j,columns=['ID'])
   for a,i in k.iterrows():
         temp = df[df['ID']==i['ID']]
         temp = temp.reset_index(drop=True)
 
         for x,y in temp.iterrows():
               if temp.iloc[x+1:x+13,2].sum() >= 1:
                    temp.loc[x,'flag']=1
               else:
                    temp.loc[x,'flag']=0
         default = default.append(temp)
1

There are 1 best solutions below

6
mozway On

You can use a groupby.rolling:

df['Flag'] = (df
      .assign(Period=pd.to_datetime(df['Period']).dt.to_period('M'))
      [::-1]
      .groupby('ID').rolling(12, on='Period', min_periods=1)
      ['status'].max()[::-1].to_numpy()
)

Output:

   ID      Period  status  Flag
0   A  2020-10-28       0   1.0
1   A  2020-11-28       0   1.0
2   A  2020-12-28       0   1.0
3   A  2021-01-28       0   1.0
4   A  2021-02-28       0   1.0
5   A  2021-03-28       0   1.0
6   A  2021-04-28       0   1.0
7   A  2021-05-28       0   1.0
8   A  2021-06-28       1   1.0
9   A  2021-07-28       0   0.0
10  A  2021-08-28       0   0.0
11  A  2021-09-28       0   0.0
12  A  2021-10-28       0   0.0

To only consider the previous periods:

df['Flag'] = (df
      .assign(Period=pd.to_datetime(df['Period']).dt.to_period('M'))
      .set_index('Period')
      [::-1]
      .groupby('ID')['status']
      .transform(lambda g: g.rolling(12, min_periods=1)
                            .max().shift(fill_value=0)
                 )
      .to_numpy()[::-1]
)

Output:

   ID      Period  status  Flag
0   A  2020-10-28       0   1.0
1   A  2020-11-28       0   1.0
2   A  2020-12-28       0   1.0
3   A  2021-01-28       0   1.0
4   A  2021-02-28       0   1.0
5   A  2021-03-28       0   1.0
6   A  2021-04-28       0   1.0
7   A  2021-05-28       0   1.0
8   A  2021-06-28       1   0.0
9   A  2021-07-28       0   0.0
10  A  2021-08-28       0   0.0
11  A  2021-09-28       0   0.0
12  A  2021-10-28       0   0.0