Finding First and Last Values of a "Run" in Pandas

71 Views Asked by At

I am trying to figure out the best way to get information on a "run" in pandas

The sample code below returns the following results:

enter image description here

import pandas as pd
import numpy as np


df2 = pd.DataFrame({'Price':[0.0, 3.6, 9.3, 4.5, 2.9, 3.2, 1.0, 6.7, 8.7, 9.8, 3.4, .7, 2.2, 6.5, 3.4, 1.7, 9.4, 10.0], 'PriceDate':['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04', '2023-10-05', '2023-10-06', '2023-10-07', '2023-10-08',  '2023-10-09',  '2023-10-10',  '2023-10-11',  '2023-10-12',  '2023-10-13',  '2023-10-14',  '2023-10-15',  '2023-10-16',  '2023-10-17']})


df2['Trend']=np.where(df2['Price']>df2['Price'].shift(),"UPTREND","DOWNTREND")

Now ignore for a second that the first value shouldn't have a trend value. The trend value simply shows if the current price is greater than the prior price (uptrend) or less than the lower price (downtrend).

What I want to know is

  1. What is the first date of any uptrend/downtrend
  2. What is the last date of the uptrend/downtrend
  3. What is the first price of the uptrend/downtrend
  4. What is the last price of the uptrend/downtrend

So - the first uptrend starts on 10/2 and ends on 10/3, first price is 3.6, last price is 9.3 Another uptrend started on 10/8 ended on 10/10 with a first price of 6.7 and an end price on 9.8

I'd also like to get the last price of the prior trend, so for example - that 10/8 record looks like this

enter image description here

Any help will be greatly appreciated

2

There are 2 best solutions below

0
Sand On

One simple approach could be holding a variable to record the current state and the values associated with the start of a run, and iterate down the dataframe until that value changes. Record, and restart.

Pseudo code:

df = pd.read_excel("path/to/spreadhseet.xlsx")

run = { "First Date":None, ... "End Price":None, "Status":None }
runs = []

# Set the first row to the 'run'
firstRow = df.iloc[0]
run["First Date"] = firstRow["PriceDate"]
run["Start Price"] = firstRow["Price"]
run["Status"] = firstRow["Trend"]

# Iterate down the dataframe, starting @1 since we already have row 0
for rowIndex in range(1, len(df)):
    row = df.iloc[rowIndex]
    previousRow = df.iloc[rowIndex - 1]
    # Trend changed
    if row["Trend"] != run["Status"]: # Would need to consider the run at EOF
        run["Last Date"] = previousRow["PriceDate"]
        run["Last Price"] = previousRow["Price"]
        # Run data now complete, save and make new 'run'
        runs.append(run)
        run = { "First Date":None, ... "End Price":None, "Status":None }
        run["First Date"] = row["PriceDate"]
        run["Start Price"] = row["Price"]
        run["Status"] = row["Trend"]

With this, you would then have a list runs that contains a dictionary with the info for each run that could be converted into a dataframe, matching your expected output.

However, this is very much still pseudo code would absolutely need to be reviewed to account for how your data is structured, accounting for the run on the last row, runs that are only 1 entry long, etc etc --- but something like this should get you pretty close. Good luck!

0
Nick On

You can achieve the result you want by grouping on the trend "number" (incrementing each time the trend changes), then taking first and last values of the date and price. You can then generate the prior end price as the end price shifted.

df2 = pd.DataFrame({'Price':[0.0, 3.6, 9.3, 4.5, 2.9, 3.2, 1.0, 6.7, 8.7, 9.8, 3.4, .7, 2.2, 6.5, 1.7, 9.4, 10.0], 'PriceDate':['2023-10-01', '2023-10-02', '2023-10-03', '2023-10-04', '2023-10-05', '2023-10-06', '2023-10-07', '2023-10-08',  '2023-10-09',  '2023-10-10',  '2023-10-11',  '2023-10-12',  '2023-10-13',  '2023-10-14',  '2023-10-15',  '2023-10-16',  '2023-10-17']})

df2['Trend']=np.where(df2['Price']>df2['Price'].shift(),"UPTREND","DOWNTREND")

# set the first value of Trend to '' so it doesn't get included in the groups
df2.loc[0, 'Trend'] = ''
g = df2['Trend'].ne(df2['Trend'].shift()).cumsum() - 1

res = (df2[g>0]
    .groupby(g).agg(**{ 
        'First Date': ('PriceDate', 'first'),
        'Last Date': ('PriceDate', 'last'),
        'Start Price' : ('Price', 'first'),
        'End Price' : ('Price', 'last') 
    })
    .reset_index(drop=True)
)
res['Prior End Price'] = res['End Price'].shift().fillna(df2['Price'][0])

Output:

   First Date   Last Date  Start Price  End Price  Prior End Price
0  2023-10-02  2023-10-03          3.6        9.3              0.0
1  2023-10-04  2023-10-05          4.5        2.9              9.3
2  2023-10-06  2023-10-06          3.2        3.2              2.9
3  2023-10-07  2023-10-07          1.0        1.0              3.2
4  2023-10-08  2023-10-10          6.7        9.8              1.0
5  2023-10-11  2023-10-12          3.4        0.7              9.8
6  2023-10-13  2023-10-14          2.2        6.5              0.7
7  2023-10-15  2023-10-15          1.7        1.7              6.5
8  2023-10-16  2023-10-17          9.4       10.0              1.7