I'm trying to make that estimate if the team will win or lose based on:
['team_id', 'nb_games', 'team_gamesPlayed',
'team_pergame_wins', 'team_pergame_losses',
'team_pergame_goals', 'win_game']
The win_game is made of two classes (lose = 0, win = 1). When I try to find the best parameter for the mlpclassifier with the following code, I end up with
[activation='relu', alpha=0.01, hidden_layer_sizes=(200,100,50),
learning_rate='adaptive', max_iter=300, shuffle=False,
solver='adam', random_state=42]
The problem is that those parameters, mlpclassifier only return 0 (lose) for every guess it make.
Is there a way to specify that it should concentrate on 1 (win) for the prediction?
P.S. Before those changes in parameters, I had 0.47 accuracy for 1 (wins) with these parameters: hidden_layer_sizes=(64, 32), max_iter=100, random_state=42.
from itertools import combinations
import os
import glob
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score
import joblib
from sklearn.model_selection import GridSearchCV
csvpath = '/path/to/csv/'
df = pd.read_csv(f'{csvpath}df_2005-2021.csv')
parameter_space = {
'hidden_layer_sizes': [(200,100,50), (500, 300, 200, 50), (500,100,50), (500,200,50)],
'activation': ['relu', 'identity'],
'solver': ['adam'],
'alpha': [0.01, 0.001, 0.0001],
'learning_rate': ['adaptive', 'constant'],
'shuffle' : [False],
'max_iter' : [200, 300, 500],
}
all_col = ['team_id', 'nb_games', 'team_gamesPlayed', 'team_pergame_wins', 'team_pergame_losses', 'team_pergame_goals', 'win_game']
sc_col = ['team_id', 'nb_games', 'team_gamesPlayed', 'team_pergame_wins', 'team_pergame_losses', 'team_pergame_goals']
scaler = StandardScaler()
df[sc_col] = scaler.fit_transform(df[sc_col])
data = df[all_col]
X = data.iloc[:, :-1]
y = data.iloc[:, -1]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)
mlp = MLPClassifier()
clf = GridSearchCV(mlp, parameter_space, n_jobs=-1, cv=3)
clf.fit(X_train, y_train)
# Best paramete set
print('Best parameters found:\n', clf.best_params_)
# All results
print('----------------------all results----------------------')
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))
y_true, y_pred = y_test , clf.predict(X_test)
from sklearn.metrics import classification_report
print('Results on the test set:')
print(classification_report(y_true, y_pred))