ID Period status Flag (Output)
A 2020-10-28 0 1
A 2020-11-28 0 1
A 2020-12-28 0 1
A 2021-01-28 0 1
A 2021-02-28 0 1
A 2021-03-28 0 1
A 2021-04-28 0 1
A 2021-05-28 0 1
A 2021-06-28 1 0
A 2021-07-28 0 0
A 2021-08-28 0 0
A 2021-09-28 0 0
A 2021-10-28 0 0
Here is the simple example for generating flag based on the status column: example: for unique ID-A, to flag the first row I check the 'status' column for next 12 month if the status within next 12 months is 1, I will flag the O/P as 1. below is my sample code which gives me the desired result, however performance wise this is not efficient for a large set of data (have close to ~1 million data with different ID). Is there a way that this can be coded in a simpler and faster way or looping is the only option.
df = df1[['ID','Period','status']]
df = df.sort_values(by=['ID','Period']).reset_index(drop=True)
df['flag']=0
default=pd.DataFrame()
j=df['ID'].unique()
k=pd.DataFrame(j,columns=['ID'])
for a,i in k.iterrows():
temp = df[df['ID']==i['ID']]
temp = temp.reset_index(drop=True)
for x,y in temp.iterrows():
if temp.iloc[x+1:x+13,2].sum() >= 1:
temp.loc[x,'flag']=1
else:
temp.loc[x,'flag']=0
default = default.append(temp)
You can use a
groupby.rolling:Output:
To only consider the previous periods:
Output: