I have a vectorised def:
def selection_update_weights(df):
# Define the selections for 'Win'
selections_win = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)", "W & O 2.5",
"W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)", "W & O 1.5",
"W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)", "W & U 4.5",
"W (untested)", "W"]
# Create a boolean mask for the condition for 'Win'
mask_win = (df['selection_match'] == "no match") & \
(df['selection'].isin(selections_win)) & \
(df['result_match'] == "no match") & \
(df['result'] != 'draw')
# Apply the condition and update the 'Win' column
df.loc[mask_win, 'Win'] = df.loc[mask_win, 'predicted_score_difference'] + 0.02
# Define the selections for 'DNB'
selections_DNB = ["DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5", "DNB or O 2.5 (untested)",
"DNB or O 2.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5",
"DNB or O 1.5 (untested)", "DNB or O 1.5", "DNB (untested)", "DNB"]
# Create a boolean mask for the condition for 'DNB'
mask_DNB = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_DNB)) & \
(df['result_match'] == 'no match') & \
(df['result'] != 'draw'))
# Apply the condition and update the 'DNB' column
df.loc[mask_DNB, 'DNB'] = df.loc[mask_DNB, 'predicted_score_difference'] + 0.02
# Define the selections for O 1.5'
selections_O_1_5 = ["W & O 1.5 (both untested)", "Win (untested) & O 1.5", "Win & O 1.5 (untested)",
"W & O 1.5", "DNB or O 1.5 (both untested)", "DNB (untested) or O 1.5",
"DNB or O 1.5 (untested)", "DNB or O 1.5", "O 1.5 (untested)", "O 1.5"]
# Create a boolean mask for the condition for 'O 1.5'
mask_O_1_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_O_1_5)) & \
(df['total_score'] < 2))
# Apply the condition and update the 'O 1.5' column
df.loc[mask_O_1_5, 'O_1_5'] = df.loc[mask_O_1_5, 'predicted_total_score'] + 0.02
# Define the selections for O 2.5'
selections_O_2_5 = ["W & O 2.5 (both untested)", "Win (untested) & O 2.5", "Win & O 2.5 (untested)",
"W & O 2.5", "DNB or O 2.5 (both untested)", "DNB (untested) or O 2.5",
"DNB or O 2.5 (untested)", "DNB or O 2.5", "O 2.5 (untested)", "O 2.5"]
# Create a boolean mask for the condition for 'O 2.5'
mask_O_2_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_O_2_5)) & \
(df['total_score'] < 3))
# Apply the condition and update the 'O 2.5' column
df.loc[mask_O_2_5, 'O_2_5'] = df.loc[mask_O_2_5, 'predicted_total_score'] + 0.02
# Define the selections for U 4.5'
selections_U_4_5 = ["W & U 4.5 (both untested)", "Win (untested) & U 4.5", "Win & U 4.5 (untested)",
"W & U 4.5", "U 4.5 (untested)", "U 4.5"]
# Create a boolean mask for the condition for 'O 2.5'
mask_U_4_5 = ((df['selection_match'] == 'no match') & \
(df['selection'].isin(selections_U_4_5)) & \
(df['total_score'] > 4))
# Apply the condition and update the 'O 2.5' column
df.loc[mask_U_4_5, 'U_4_5'] = df.loc[mask_U_4_5, 'predicted_total_score'] - 0.02
return df
First run works:
However, subsequent loops don't yield any changes.
While I have a very large dataframe, Columns are updated partially. And I am not sure why.
The original dataframe is unaffected.
Would it help if I break down every if-else but the dataframe is too big and the row calcs take 20 mins?
and I apply it by:
df = selection_update_weights(df)
First run works:
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
3 2 0 2 2 12.370528 12.090888 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
9 2 0 2 2 11.439416 10.291339 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
10 2 0 2 2 11.226599 10.228954 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
11 1 5 6 4 12.069979 10.194557 away home no match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
20 2 0 2 2 9.808659 9.049657 home home match 1.1 0.7 2 3 4 W & O 2.5 (both untested) no match
When I run the def provides:
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
44 3 3 6 0 8.748172 8.135116 draw home no match 8.155116 0.7 2.000000 3.000000 4.0 W & O 2.5 (both untested) no match
50 1 0 1 1 8.605350 7.932909 home home match 1.100000 0.7 8.625350 8.625350 4.0 W & O 1.5 (both untested) no match
57 1 1 2 0 7.510030 7.750101 draw home no match 7.770101 0.7 2.000000 7.530030 4.0 W & O 1.5 (both untested) no match
62 0 1 1 1 8.895045 7.710740 away away match 1.100000 0.7 8.915045 8.915045 4.0 W & O 1.5 (both untested) no match
85 1 0 1 1 8.099853 7.444815 home home match 1.100000 0.7 8.119853 8.119853 4.0 W & O 1.5 (both untested) no match
However, subsequent loops don't yield any changes.
While I have a very large dataframe, This snippet is where the weights don't get updated. Weights are updated partially. And I am not sure why.
df.head():
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
44 3 3 6 0 8.748172 8.135116 draw home no match 1.1 0.7 2.0 3.000000 4.0 W & O 2.5 (both untested) no match
50 1 0 1 1 8.605350 7.932909 home home match 1.1 0.7 2.0 8.625350 4.0 W & O 1.5 (both untested) no match
57 1 1 2 0 7.510030 7.750101 draw home no match 1.1 0.7 2.0 7.530030 4.0 W & O 1.5 (both untested) no match
62 0 1 1 1 8.895045 7.710740 away away match 1.1 0.7 2.0 8.915045 4.0 W & O 1.5 (both untested) no match
85 1 0 1 1 8.099853 7.444815 home home match 1.1 0.7 2.0 8.119853 4.0 W & O 1.5 (both untested) no match
So when I am applying it:
df = selection_update_weights(df)
I should ideally get
home_score away_score total_score score_difference predicted_total_score predicted_score_difference result predicted_result result_match Win DNB O_1_5 O_2_5 U_4_5 selection selection_match
3 3 6 0 8.748172 8.135116 draw home no match 8.155116 0.7 2.0 3 4.0 W & O 2.5 (both untested) no match
1 0 1 1 8.605350 7.932909 home home match 1.100000 0.7 8.625350 8.625350 4.0 W & O 1.5 (both untested) no match
1 1 2 0 7.510030 7.750101 draw home no match 7.770101 0.7 2.0 7.530030 4.0 W & O 1.5 (both untested) no match
0 1 1 1 8.895045 7.710740 away away match 1.100000 0.7 8.915045 8.915045 4.0 W & O 1.5 (both untested) no match
1 0 1 1 8.099853 7.444815 home home match 1.100000 0.7 8.119853 8.119853 4.0 W & O 1.5 (both untested) no match
However that's not happening and the original dataframe is unaffected.
Would it help if I break down every if-else but the data frame is too big and the row calcs take 20 mins?
In the second loop you only Win and O_2_5 columns are updated. Win is updated in function of the value predicted_score_difference O_2_5 is updated in function of the value predicted_total_score
predicted_score_difference and predicted_total_score never change in selection_update_weights, so we could consider predicted_score_difference , and predicted_total_score as constants in selection_update_weights. Since you are calling a method multiple times and updating its value in terms of constants, they will never have another value.
I'm not sure why you want to call selection_update_weights multiple times, but maybe you should update Win, O_2_5 (and the rest of the other columns) in terms of themselves or update predicted_score_difference and predicted_total_score in the selection_update_weights function