python: Vectorised Def works only on the first condition. Subsequent loops are unaffected

42 Views Asked by At

I have a vectorised def:

def selection_update_weights(df):
    # Define the selections for 'Win'
    selections_win = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", "W & O 2.5", 
                      "W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)", "W & O 1.5", 
                      "W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)", "W & U 4.5", 
                      "W (untested)", "W"]

    # Create a boolean mask for the condition for 'Win'
    mask_win = (df['selection_match'] == "no match") & \
               (df['selection'].isin(selections_win)) & \
               (df['result_match'] == "no match") & \
               (df['result'] != 'draw')

    # Apply the condition and update the 'Win' column
    df.loc[mask_win, 'Win'] = df.loc[mask_win, 'predicted_score_difference'] + 0.02

    # Define the selections for 'DNB'
    selections_DNB = ["DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5", "DNB or O 2.5 (untested)",
                      "DNB or O 2.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5", 
                      "DNB or O 1.5 (untested)", "DNB or O 1.5", "DNB (untested)", "DNB"]

    # Create a boolean mask for the condition for 'DNB'
    mask_DNB = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_DNB)) & \
                (df['result_match'] == 'no match') & \
                (df['result'] != 'draw'))

    # Apply the condition and update the 'DNB' column
    df.loc[mask_DNB, 'DNB'] = df.loc[mask_DNB, 'predicted_score_difference'] + 0.02

    # Define the selections for O 1.5'
    selections_O_1_5 = ["W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)",
                        "W & O 1.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5", 
                        "DNB or O 1.5 (untested)", "DNB or O 1.5", "O 1.5 (untested)", "O 1.5"]

    # Create a boolean mask for the condition for 'O 1.5'
    mask_O_1_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_O_1_5)) & \
                (df['total_score'] < 2))

    # Apply the condition and update the 'O 1.5' column
    df.loc[mask_O_1_5, 'O_1_5'] = df.loc[mask_O_1_5, 'predicted_total_score'] + 0.02

    # Define the selections for O 2.5'
    selections_O_2_5 = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", 
                        "W & O 2.5", "DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5",
                        "DNB or O 2.5 (untested)", "DNB or O 2.5", "O 2.5 (untested)", "O 2.5"]

    # Create a boolean mask for the condition for 'O 2.5'
    mask_O_2_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_O_2_5)) & \
                (df['total_score'] < 3))

    # Apply the condition and update the 'O 2.5' column
    df.loc[mask_O_2_5, 'O_2_5'] = df.loc[mask_O_2_5, 'predicted_total_score'] + 0.02

    # Define the selections for U 4.5'
    selections_U_4_5 = ["W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)",
                        "W & U 4.5", "U 4.5 (untested)", "U 4.5"]

    # Create a boolean mask for the condition for 'O 2.5'
    mask_U_4_5 = ((df['selection_match'] == 'no match') & \
                (df['selection'].isin(selections_U_4_5)) & \
                (df['total_score'] > 4))

    # Apply the condition and update the 'O 2.5' column
    df.loc[mask_U_4_5, 'U_4_5'] = df.loc[mask_U_4_5, 'predicted_total_score'] - 0.02

    return df

First run works:

However, subsequent loops don't yield any changes.

While I have a very large dataframe, Columns are updated partially. And I am not sure why.

The original dataframe is unaffected.

Would it help if I break down every if-else but the dataframe is too big and the row calcs take 20 mins?

and I apply it by:

df = selection_update_weights(df)

First run works:

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match  Win  DNB  O_1_5  O_2_5  U_4_5                  selection selection_match
3            2           0            2                 2              12.370528                   12.090888   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
9            2           0            2                 2              11.439416                   10.291339   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
10           2           0            2                 2              11.226599                   10.228954   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
11           1           5            6                 4              12.069979                   10.194557   away             home     no match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match
20           2           0            2                 2               9.808659                    9.049657   home             home        match  1.1  0.7      2      3      4  W & O 2.5 (both untested)        no match

When I run the def provides:

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match       Win  DNB     O_1_5     O_2_5  U_4_5                  selection selection_match
44           3           3            6                 0               8.748172                    8.135116   draw             home     no match  8.155116  0.7  2.000000  3.000000    4.0  W & O 2.5 (both untested)        no match
50           1           0            1                 1               8.605350                    7.932909   home             home        match  1.100000  0.7  8.625350  8.625350    4.0  W & O 1.5 (both untested)        no match
57           1           1            2                 0               7.510030                    7.750101   draw             home     no match  7.770101  0.7  2.000000  7.530030    4.0  W & O 1.5 (both untested)        no match
62           0           1            1                 1               8.895045                    7.710740   away             away        match  1.100000  0.7  8.915045  8.915045    4.0  W & O 1.5 (both untested)        no match
85           1           0            1                 1               8.099853                    7.444815   home             home        match  1.100000  0.7  8.119853  8.119853    4.0  W & O 1.5 (both untested)        no match

However, subsequent loops don't yield any changes.

While I have a very large dataframe, This snippet is where the weights don't get updated. Weights are updated partially. And I am not sure why.

df.head():
home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match  Win  DNB  O_1_5     O_2_5  U_4_5                  selection selection_match
44           3           3            6                 0               8.748172                    8.135116   draw             home     no match  1.1  0.7    2.0  3.000000    4.0  W & O 2.5 (both untested)        no match
50           1           0            1                 1               8.605350                    7.932909   home             home        match  1.1  0.7    2.0  8.625350    4.0  W & O 1.5 (both untested)        no match
57           1           1            2                 0               7.510030                    7.750101   draw             home     no match  1.1  0.7    2.0  7.530030    4.0  W & O 1.5 (both untested)        no match
62           0           1            1                 1               8.895045                    7.710740   away             away        match  1.1  0.7    2.0  8.915045    4.0  W & O 1.5 (both untested)        no match
85           1           0            1                 1               8.099853                    7.444815   home             home        match  1.1  0.7    2.0  8.119853    4.0  W & O 1.5 (both untested)        no match

So when I am applying it:

df = selection_update_weights(df)

I should ideally get

home_score  away_score  total_score  score_difference  predicted_total_score  predicted_score_difference result predicted_result result_match       Win  DNB    O_1_5    O_2_5    U_4_5                       selection  selection_match
          3           3            6                 0               8.748172                    8.135116   draw             home     no match  8.155116  0.7       2.0         3      4.0      W & O 2.5 (both untested)        no match
          1           0            1                 1               8.605350                    7.932909   home             home        match  1.100000  0.7  8.625350  8.625350      4.0      W & O 1.5 (both untested)        no match
          1           1            2                 0               7.510030                    7.750101   draw             home     no match  7.770101  0.7       2.0  7.530030      4.0      W & O 1.5 (both untested)        no match
          0           1            1                 1               8.895045                    7.710740   away             away        match  1.100000  0.7  8.915045  8.915045      4.0      W & O 1.5 (both untested)        no match
          1           0            1                 1               8.099853                    7.444815   home             home        match  1.100000  0.7  8.119853  8.119853      4.0      W & O 1.5 (both untested)        no match

However that's not happening and the original dataframe is unaffected.

Would it help if I break down every if-else but the data frame is too big and the row calcs take 20 mins?

1

There are 1 best solutions below

2
patoba On

In the second loop you only Win and O_2_5 columns are updated. Win is updated in function of the value predicted_score_difference O_2_5 is updated in function of the value predicted_total_score

predicted_score_difference and predicted_total_score never change in selection_update_weights, so we could consider predicted_score_difference , and predicted_total_score as constants in selection_update_weights. Since you are calling a method multiple times and updating its value in terms of constants, they will never have another value.

I'm not sure why you want to call selection_update_weights multiple times, but maybe you should update Win, O_2_5 (and the rest of the other columns) in terms of themselves or update predicted_score_difference and predicted_total_score in the selection_update_weights function