Isolation Forest getting accuracy score 0.0

Question

Isolation Forest getting accuracy score 0.0

446 Views Asked by Gabriel Rodrigues At 23 August 2022 at 20:42

Edit: please share comments as I'm learning to post good questions

I'm trying to train this dataset with IsolationForest(), I need to train this dataset, and use it in another dataset with altered qualities to predict the quality values and fetch all wines with quality 8 and 9.

However I'm having some problems with it. Because the accuracy score is 0.0 from the classification report:

print(classification_report(y_test, prediction))

              precision    recall  f1-score   support

          -1       0.00      0.00      0.00       0.0
           1       0.00      0.00      0.00       0.0
           3       0.00      0.00      0.00     866.0
           4       0.00      0.00      0.00     829.0
           5       0.00      0.00      0.00     841.0
           6       0.00      0.00      0.00     861.0
           7       0.00      0.00      0.00     822.0
           8       0.00      0.00      0.00     886.0
           9       0.00      0.00      0.00     851.0

    accuracy                           0.00    5956.0
   macro avg       0.00      0.00      0.00    5956.0
weighted avg       0.00      0.00      0.00    5956.0

I don't know if it's a hyperparameter issue, or if I'm clearing the wrong data or putting wrong parameters, I already tried to use with SMOTE and without SMOTE, I wanted to reach an accuracy of 90% at least.

I'll leave the shared drive link public for dataset verification::

https://drive.google.com/drive/folders/18_sOSIZZw9DCW7ftEKuOG4aIzGXoasFe?usp=sharing

Here's my code:

from sklearn.preprocessing import OrdinalEncoder
from sklearn.ensemble import IsolationForest
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from imblearn.over_sampling import SMOTE
from sklearn.metrics import classification_report,confusion_matrix

df = pd.read_csv('wines.csv')

df.head(5)

ordinalEncoder = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-99).fit(df[['color']])
df[['color']] = ordinalEncoder.transform(df[['color']])

df.info()

df['color'] = df['color'].astype(int)

df.head(3)

stm = SMOTE(k_neighbors=4)
x_smote = df.drop('quality',axis=1)
y_smote = df['quality']
x_smote,y_smote = stm.fit_resample(x_smote,y_smote)

print(x_smote.shape,y_smote.shape)

x_smote.columns

scaler = StandardScaler()
X = scaler.fit_transform(x_smote)
y = y_smote

X.shape, y.shape

x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3)

from sklearn.ensemble import IsolationForest
from sklearn.metrics import hamming_loss

iforest = IsolationForest(n_estimators=200, max_samples=0.1, contamination=0.10, max_features=1.0, bootstrap=False, n_jobs=-1, 
                            random_state=None, verbose=0, warm_start=False)

iforest_fit = iforest.fit(x_train,y_train)

prediction = iforest_fit.predict(x_test)

print (prediction.shape, y_test.shape)

y.value_counts()

prediction

print(confusion_matrix(y_test, prediction))
hamming_loss(y_test, prediction)

from sklearn.metrics import classification_report
print(classification_report(y_test, prediction))

Original Q&A

There are 1 best solutions below

**blackraven** · Answer 1 · 2022-08-24T00:57:34.877000

May I know why do you choose Isolation Forest as your model? This article says that Isolation Forest is an unsupervised learning algorithm for anomaly detection.

When I print some samples of the prediction (by Isolation Forest) and samples of actual truth, I get the following results, so you know why the accuracy score is 0.0:

print(list(prediction[0:15]))
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(y_test[0:15]))
[9, 4, 4, 7, 9, 3, 6, 7, 4, 8, 8, 7, 3, 8, 5]

The wines.csv dataset and your code are both pointing towards a multi-class classification problem. I have chosen RandomForestClassifier() to continue with the second part of your code:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import hamming_loss

model = RandomForestClassifier()
model.fit(x_train,y_train)
prediction = model.predict(x_test)

print(prediction[0:15])    #see 15 samples of prediction
[3, 9, 5, 5, 7, 9, 7, 6, 9, 8, 5, 9, 8, 3, 3]

print(list(y_test[0:15]))    #see 15 samples of actual truth
[3, 9, 5, 6, 6, 9, 7, 5, 9, 8, 5, 9, 8, 3, 3]

print(confusion_matrix(y_test, prediction))
[[842   0   0   0   0   0   0]
 [  2 815  17   8   1   1   0]
 [  8  50 690 130  26   2   0]
 [  2  28 152 531 128  16   0]
 [  4   1  15  66 716  32   3]
 [  0   1   0   4  12 833   0]
 [  0   0   0   0   0   0 820]]

print('hamming_loss =', hamming_loss(y_test, prediction))
hamming_loss = 0.11903962390866353

print(classification_report(y_test, prediction))
              precision    recall  f1-score   support

           3       0.98      1.00      0.99       842
           4       0.91      0.97      0.94       844
           5       0.79      0.76      0.78       906
           6       0.72      0.62      0.67       857
           7       0.81      0.86      0.83       837
           8       0.94      0.98      0.96       850
           9       1.00      1.00      1.00       820

    accuracy                           0.88      5956
   macro avg       0.88      0.88      0.88      5956
weighted avg       0.88      0.88      0.88      5956

The accuracy is already 0.88 even before tuning hyperparameters.

Isolation Forest getting accuracy score 0.0

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in DATA-SCIENCE

Related Questions in MULTICLASS-CLASSIFICATION

Related Questions in ISOLATION-FOREST

Trending Questions

Popular # Hahtags

Popular Questions