IsolationForest is always predicting 1

260 Views Asked by hafiz031 At 20 June 2021 at 21:41

I am working with a project to detect out-of-domain text input, with the help of IsolationForest and tf-idf feature. Following is my works in summarized form:

TRAINING

On tfidf:
- Fit and transform in-domain dataset using CountVectorizer().
- Fit a tfidftransformer() with my with this CountVectorizer() and save the transformer (to use it during test time).
- Therefore, transform the training data using tfidftransformer()
- Save both CountVectorizer()'s vocabulary_ and TfidfTransformer() object using pickle for test time usage.
On IsolationForest:
- Collect the transformed in-domain dataset and train a IsolationForest() novelity detector.
- Save the model using joblib.

TESTING:

Load all of the saved models.
Get the tfidf transformed feature of current out-of-domain input text after replicating all the steps (transformations only) similar to training step.
Predict if it is out-of-domain or not, using the saved IsolationForest model.

But what I have found even if the tf-idf feature is quite different for each of my test input, the IsolationForest always predicting 1.

What is probably going wrong?

NB: I also tried inputting dummy vectors to IsolationForest model by mimicking the output of tf-idf transformer to make sure if the tf-idf module is responsible for this or not but no matter which random vector I provide I always get 1 as output from IsolationForest. Also note that, tf-idf has a lot of features (tokens), in my case the count is 48015.

Original Q&A

IsolationForest is always predicting 1

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in TF-IDF

Related Questions in ANOMALY-DETECTION

Related Questions in ISOLATION-FOREST

Trending Questions

Popular # Hahtags

Popular Questions