Specifying to sklearn mplclassifier which category is the most important

35 Views Asked by Vincent Coulombe At 30 September 2023 at 19:16

I'm trying to make that estimate if the team will win or lose based on:

['team_id', 'nb_games', 'team_gamesPlayed', 
'team_pergame_wins', 'team_pergame_losses', 
'team_pergame_goals', 'win_game']

The win_game is made of two classes (lose = 0, win = 1). When I try to find the best parameter for the mlpclassifier with the following code, I end up with

[activation='relu', alpha=0.01, hidden_layer_sizes=(200,100,50),
learning_rate='adaptive', max_iter=300, shuffle=False, 
solver='adam', random_state=42]

The problem is that those parameters, mlpclassifier only return 0 (lose) for every guess it make.

Is there a way to specify that it should concentrate on 1 (win) for the prediction?

P.S. Before those changes in parameters, I had 0.47 accuracy for 1 (wins) with these parameters: hidden_layer_sizes=(64, 32), max_iter=100, random_state=42.

from itertools import combinations
import os
import glob
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score
import joblib
from sklearn.model_selection import GridSearchCV


csvpath = '/path/to/csv/'
df = pd.read_csv(f'{csvpath}df_2005-2021.csv')

parameter_space = {
    'hidden_layer_sizes': [(200,100,50), (500, 300, 200, 50), (500,100,50), (500,200,50)],
    'activation': ['relu', 'identity'],
    'solver': ['adam'],
    'alpha': [0.01, 0.001, 0.0001],
    'learning_rate': ['adaptive', 'constant'],
    'shuffle' : [False],
    'max_iter' : [200, 300, 500],
}

all_col = ['team_id', 'nb_games', 'team_gamesPlayed', 'team_pergame_wins', 'team_pergame_losses', 'team_pergame_goals', 'win_game']

sc_col = ['team_id', 'nb_games', 'team_gamesPlayed', 'team_pergame_wins', 'team_pergame_losses', 'team_pergame_goals']

scaler = StandardScaler()
df[sc_col] = scaler.fit_transform(df[sc_col])

data = df[all_col]

X = data.iloc[:, :-1]
y = data.iloc[:, -1]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=40)

mlp = MLPClassifier()
clf = GridSearchCV(mlp, parameter_space, n_jobs=-1, cv=3)
clf.fit(X_train, y_train)

# Best paramete set
print('Best parameters found:\n', clf.best_params_)

# All results
print('----------------------all results----------------------')
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
for mean, std, params in zip(means, stds, clf.cv_results_['params']):
    print("%0.3f (+/-%0.03f) for %r" % (mean, std * 2, params))

y_true, y_pred = y_test , clf.predict(X_test)

from sklearn.metrics import classification_report
print('Results on the test set:')
print(classification_report(y_true, y_pred))

Original Q&A

Specifying to sklearn mplclassifier which category is the most important

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in MLP

Trending Questions

Popular # Hahtags

Popular Questions