How to calculate t-test for the difference of means to assess which algorithm achieves higher F1 score?

107 Views Asked by At

I am working on a project in which my expected outcome is which classifier has performed very well on the basis of F1 Score. I am conducting a t-test for the difference of means to assess which algorithm achieves higher F1 score.

I have F1 Score of both classifiers alog_A: 0.589744 and algo_B: 0.641026

Following is the code that I am using to meet my project requirements but from this code I am getting any of the resuls it's showing me NaN. How I can fix this issue?

from scipy import stats
t_value,p_value=stats.ttest_ind(f1_score_Algo_A,f1_score_Algo_B)
print('Test statistic is %f'%float("{:.6f}".format(t_value)))
print('p-value for two tailed test is %f'%p_value)

I am getting the following output

Test statistic is nan
p-value for two tailed test is nan

My expected results is which algorithm has performed well with t-test difference value and p_value.

1

There are 1 best solutions below

2
Sandipan Dey On

Try this, note that in this case both f1_scores_Algo_A and f1_scores_Algo_B are lists and can be treated as two independent samples of scores:

from sklearn.metrics import f1_score
from scipy import stats

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from mlxtend.data import iris_data
from sklearn.model_selection import train_test_split

algo_A = LogisticRegression(random_state=1, max_iter=1000)  # try your algos / models
algo_B = DecisionTreeClassifier(random_state=1, max_depth=3)

X, y = iris_data() # try your dataset

f1_scores_Algo_A, f1_scores_Algo_B = [], []

for i in range(100):
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

    y_pred = algo_A.fit(X_train, y_train).predict(X_test)
    f1_scores_Algo_A.append(f1_score(y_test, y_pred, average='micro'))

    y_pred = algo_B.fit(X_train, y_train).predict(X_test)
    f1_scores_Algo_B.append(f1_score(y_test, y_pred, average='micro'))

The next plot shows the distribution of F1-scores obtained with different train-test splits with the models.

enter image description here

Now, we can do the paired t-test:

t_value,p_value=stats.ttest_ind(f1_scores_Algo_A, f1_scores_Algo_B)
print('Test statistic is %f'%float("{:.6f}".format(t_value)))
# Test statistic is 2.457321
print('p-value for two tailed test is %f'%p_value)
# p-value for two tailed test is 0.014858

so that we can reject the null hypothesis (that 2 independent samples have identical average scores) at 5% level of significance.