Pandas Vectorized Operation - Making counting function that resets when threshold reaches 5

53 Views Asked by At

I am quite new to the programming and Im struggling to this matter. Any help is appreciated!

I have a dataframe of stocks including the prices and the signal if it will be up (1) or down (-1). I want to count the sequence of repetition into another column 'count'. So, when there is a sequence of 1,1,1; then the count will be 1,2,3. If its -1,-1,-1; then the count will be 1,2,3 too.

Additionally, when a threshold value reaches 5, the counting resets. Doesn't matter if it's 1 or -1.

So, what I have is:

     price  sign
0    13     1
1    12     1
2    11     -1
3    12     -1
4    13     1
5    14     1
6    14     1
7    14     1
8    14     1
9    14     1
10   14     1
.
.
.

And what I want is:

     price  sign  count
0    13     1       1
1    12     1       2
2    11     -1      1
3    12     -1      2
4    13     1       1
5    14     1       2
6    14     1       3
7    14     1       4
8    14     1       5
9    14     1       1
10   14     1       2
.
.
.

I already have this code in normal python code. But I cannot do this in Pandas Vectorized Operation! Help me, please!

2

There are 2 best solutions below

1
jezrael On BEST ANSWER

Use GroupBy.cumcount by consecutive values of sign with modulo 5:

df['count'] = df.groupby(df['sign'].ne(df['sign'].shift()).cumsum()).cumcount() % 5 + 1

print (df)
    price  sign  count
0      13     1      1
1      12     1      2
2      11    -1      1
3      12    -1      2
4      13     1      1
5      14     1      2
6      14     1      3
7      14     1      4
8      14     1      5
9      14     1      1
10     14     1      2

Detail:

print (df.assign(consecutive=df['sign'].ne(df['sign'].shift()).cumsum(),
                 counter=df.groupby(df['sign'].ne(df['sign'].shift()).cumsum()).cumcount(),
                 count = df.groupby(df['sign'].ne(df['sign'].shift()).cumsum()).cumcount() % 5 + 1))
    price  sign  consecutive  counter  count
0      13     1            1        0      1
1      12     1            1        1      2
2      11    -1            2        0      1
3      12    -1            2        1      2
4      13     1            3        0      1
5      14     1            3        1      2
6      14     1            3        2      3
7      14     1            3        3      4
8      14     1            3        4      5
9      14     1            3        5      1
10     14     1            3        6      2
1
Tino D On

you can first create a mask when the sign shifts from pos to neg or opposite. This is to reset when the signs change.

mask = df['sign'].ne(df['sign'].shift())

Then, create a grouping variable based on the change in sign

group = mask.cumsum()

Use the group by function to do a cumulative count, here you add 1 whenever you have a repetition of the sign.

df['count'] = df.groupby(group).cumcount() + 1

Reset if it reaches five, back to 1 (hence the +1), this is the trickiest part I guess, here is a nice introduction to lamba function:

df['count'] = df.groupby(group)['count'].apply(lambda x: (x - 1) % 5 + 1)

And then the change of signs also resets the count to 1, which is what we identified in the beginning

df.loc[mask, 'count'] = 1

Hope this helps