How to create a dataframe column where each value on a given row depends on the values of subsecant rows?

65 Views Asked by At

So I'm working with time series data, forex data to be more precise. They are in the form of

  <DATE>      <OPEN>     <HIGH>      <LOW>      <CLOSE>
2023.01.03   1.20782    1.20782     1.20775     1.20775
2023.01.03   1.20771    1.20771     1.20771     1.20771
2023.01.03   1.20739    1.20739     1.20739     1.20731
2023.01.03   1.20742    1.20742     1.20742     1.20742
2023.01.03   1.20736    1.20736     1.20736     1.20736
2023.01.03   1.20744    1.20744     1.20744     1.20744
2023.01.03   1.20762    1.20762     1.20762     1.20762

I need to add a new column 'target' of true and false values. When we take a row, we check up all the subsequent rows on the same day. If any of them has a high of 100 points more than the current row the target is put to true, but if while checking the subsequent row we find one that has a low of 10 points less than the current row the target is put to false.

My solution to the problem is to iterate over all the rows saving them in a list as open trade and while iterating comparing the rows in the list of trade with the current row to check if the trade end with a stop loss or a take profit. Here is my code :

def add_target_data(df):
    new_df = df.copy()

    new_df['target'] = False

    grouped = new_df.groupby('date')
    print("Number of days", len(grouped.groups))

    for _date, day in grouped:
        #print('Day:', _date)
        trades = []
        update_indices = []
        for index, row in day.iterrows():
            # trades = [_trade for _trade in trades if _trade['stop_loss'] < row['low']]
            for trade in trades:
                if trade['stop_loss'] >= row['low']:
                    trades.remove(trade)
                elif trade['take_profit'] <= row['high']:
                    #data.loc[trade['index'], 'target'] = True
                    update_indices.append(trade['index'])
                    trades.remove(trade)
                else:
                    pass

            trade = {
                'index': index,
                'stop_loss': row['close'] - 0.0001,
                'take_profit': row['close'] + 0.001
            }
            trades.append(trade)
        new_df.loc[update_indices, 'target'] = True

    return new_df

After running this code on a data frame of about 100000 rows, here is what we should get :

                       date       open   high    low     close  target
        index                       
2023-01-02 08:41:00 2023-01-02  1.20672 1.20673 1.20667 1.20667 False
2023-01-02 08:44:00 2023-01-02  1.20665 1.20667 1.20664 1.20667 True
2023-01-02 08:46:00 2023-01-02  1.20665 1.20666 1.20665 1.20666 False
2023-01-02 09:06:00 2023-01-02  1.20627 1.20635 1.20627 1.20635 False
2023-01-02 09:08:00 2023-01-02  1.20637 1.20638 1.20629 1.20633 True
... ... ... ... ... ... ...
2023-04-11 16:42:00 2023-04-11  1.24262 1.24267 1.24240 1.24240 True
2023-04-11 16:45:00 2023-04-11  1.24276 1.24283 1.24257 1.24271 False
2023-04-11 16:46:00 2023-04-11  1.24272 1.24282 1.24264 1.24278 False
2023-04-11 16:58:00 2023-04-11  1.24295 1.24302 1.24285 1.24289 False
2023-04-11 17:46:00 2023-04-11  1.24308 1.24308 1.24253 1.24258 True

(please this sample is just an example, in reality the rows that get a target value of true constitute only about 10% of the entire data frame)

This solution, works fine but when the data reaches few millions rows it becomes a pain to run. Does anybody knows how to do it faster and more efficiently?

0

There are 0 best solutions below