How to evaluate the performance of Machine Learning classifiers using Python by using Paired t-test?

123 Views Asked by At

I am quite beginner in machine learning. I am trying to conduct a t-test for the difference of means to assess which algorithm achieves higher F1 score. I have results of both algorithms Such as F1_score for algorithm A is 0.63 and other 0.89 for algorithm B.

I have applied the following code but I am unable to sort it out and did not understand the error very well. How can I compare two algorithms? and get the performance results from hypothesis testing?

X = data_frame.iloc[:, 3:]
y = data_frame.iloc[:,2:-7]

from mlxtend.evaluate import paired_ttest_5x2cv

t, p = paired_ttest_5x2cv(estimator1=f1_score_Algo_A, estimator2=f1_score_Algo_B, X=X, y=y)
alpha = 0.05

print('t statistic: %.3f' % t)
print('aplha ', alpha)
print('p value: %.3f' % p)

if p > alpha:
  print("Fail to reject null hypotesis")
else:
  print("Reject null hypotesis")
from mlxtend.evaluate import paired_ttest_5x2cv
----> t, p = paired_ttest_5x2cv(estimator1=lr_f1,estimator2=dt_f1, X=X, y=y, random_seed=1)
alpha = 0.05
print('t statistic: %.3f' % t)
AttributeError: 'numpy.float64' object has no attribute '_estimator_type'

Expected outcome will be which algorithm has performed well on the basis of F1_Score.

1

There are 1 best solutions below

2
Sandipan Dey On

The function paired_ttest_5x2cv() expects the trained models (to be compared) as inputs, not the F1 scores.

Here is the reproduced error with iris dataset (try with your dataset) and couple of models (a LR and a DT model, try with your own models):

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from mlxtend.data import iris_data
from sklearn.model_selection import train_test_split

from sklearn.metrics import f1_score

X, y = iris_data()
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.25,
                     random_state=123)
algo_A = LogisticRegression(random_state=1, max_iter=1000)
algo_B = DecisionTreeClassifier(random_state=1, max_depth=1)

y_pred = algo_A.fit(X_train, y_train).predict(X_test)
f1_score_Algo_A = f1_score(y_test, y_pred, average='micro')

y_pred = algo_B.fit(X_train, y_train).predict(X_test)
f1_score_Algo_B = f1_score(y_test, y_pred, average='micro')

print(f'Algo A score: {f1_score_Algo_A}, Algo B score: {f1_score_Algo_B}')

from mlxtend.evaluate import paired_ttest_5x2cv

t, p = paired_ttest_5x2cv(estimator1=f1_score_Algo_A, estimator2=f1_score_Algo_B, X=X, y=y)
#t, p = paired_ttest_5x2cv(estimator1=algo_A, estimator2=algo_B, X=X, y=y)
alpha = 0.05

print('t statistic: %.3f' % t)
print('aplha ', alpha)
print('p value: %.3f' % p)

if p > alpha:
  print("Fail to reject null hypotesis")
else:
  print("Reject null hypotesis")

# if estimator1._estimator_type == "classifier":
#   ^^^^^^^^^^^^^^^^^^^^^^^^^^
# AttributeError: 'numpy.float64' object has no attribute '_estimator_type'

Now, try with the trained models instead, it will work:

print(f'Algo A score: {f1_score_Algo_A}, Algo B score: {f1_score_Algo_B}')

from mlxtend.evaluate import paired_ttest_5x2cv

t, p = paired_ttest_5x2cv(estimator1=algo_A, estimator2=algo_B, X=X, y=y)
alpha = 0.05

print('t statistic: %.3f' % t)
print('aplha ', alpha)
print('p value: %.3f' % p)

if p > alpha:
  print("Fail to reject null hypotesis")
else:
  print("Reject null hypotesis")

# Algo A score: 0.9736842105263158, Algo B score: 0.631578947368421
# t statistic: 8.000
# aplha  0.05
# p value: 0.000       
# Reject null hypotesis

Note that the F1 scores computed above are not used while the paired t-tests are executed (here the scores are computed on the held-out test dataset just to have an idea about the models' performances), the actual scores on the CV splits are computed with the scoring function while the t-tests are done.