NLP - Worse result when adding stemming or lemmitization for Sentiment Analysis

137 Views Asked by At

I'm trying to create a full pipeline of results for sentiment analysis for a smaller subset of the IMDB reviews (only 2k pos, 2k neg) so I'm tryna show results at each stage

i.e. without any pre-processing, then basic cleaning (remove specials, stopwords, lowercasing) then testing both stemming and lemmitization (seperately) on top of the basic cleaning.

After basic cleaning I'm jumping from 50% (only binary classification so makes sense) to mid-to-low 80%'s. Then after adding stemming and lemming, it either doesn't change or for random forest gets the recall below 80%.

Why's this the case? Are my results normal? If so how do you justify using either one?

Also to note all of the models and feature extractions are using default parameters from sklearn so I haven't gotten to the model optimization part, should I try that for these 3 cases and then see if they perform worse?

Feature Extractions: Bag of Words and TF-Idf

Models: SVM, Logistic Regression, Multinomial Naive Bayes and Random Forest

Results:

Basic Cleaning (remove specials, stopwords, lowercasing)

SVM BOW
              precision    recall  f1-score   support

    Positive       0.85      0.85      0.85       530
    Negative       0.83      0.83      0.83       470

    accuracy                           0.84      1000
   macro avg       0.84      0.84      0.84      1000
weighted avg       0.84      0.84      0.84      1000


SVM TF-IDF
              precision    recall  f1-score   support

    Positive       0.85      0.88      0.86       530
    Negative       0.86      0.83      0.84       470

    accuracy                           0.85      1000
   macro avg       0.86      0.85      0.85      1000
weighted avg       0.86      0.85      0.85      1000


LR BOW
              precision    recall  f1-score   support

    Positive       0.87      0.85      0.86       530
    Negative       0.83      0.85      0.84       470

    accuracy                           0.85      1000
   macro avg       0.85      0.85      0.85      1000
weighted avg       0.85      0.85      0.85      1000


LR TF-IDF
              precision    recall  f1-score   support

    Positive       0.89      0.82      0.85       530
    Negative       0.81      0.88      0.84       470

    accuracy                           0.85      1000
   macro avg       0.85      0.85      0.85      1000
weighted avg       0.85      0.85      0.85      1000


MNB BOW
              precision    recall  f1-score   support

    Positive       0.83      0.85      0.84       530
    Negative       0.82      0.81      0.82       470

    accuracy                           0.83      1000
   macro avg       0.83      0.83      0.83      1000
weighted avg       0.83      0.83      0.83      1000


MNB TF-IDF
              precision    recall  f1-score   support

    Positive       0.86      0.84      0.85       530
    Negative       0.82      0.85      0.83       470

    accuracy                           0.84      1000
   macro avg       0.84      0.84      0.84      1000
weighted avg       0.84      0.84      0.84      1000


RFC BOW
              precision    recall  f1-score   support

    Positive       0.85      0.80      0.82       530
    Negative       0.79      0.84      0.81       470

    accuracy                           0.82      1000
   macro avg       0.82      0.82      0.82      1000
weighted avg       0.82      0.82      0.82      1000


RFC TF-IDF
              precision    recall  f1-score   support

    Positive       0.84      0.81      0.83       530
    Negative       0.80      0.83      0.81       470

    accuracy                           0.82      1000
   macro avg       0.82      0.82      0.82      1000
weighted avg       0.82      0.82      0.82      1000

Basic Cleaning + Stemming

SVM BOW
              precision    recall  f1-score   support

    Positive       0.85      0.82      0.83       530
    Negative       0.80      0.83      0.82       470

    accuracy                           0.82      1000
   macro avg       0.82      0.82      0.82      1000
weighted avg       0.82      0.82      0.82      1000


SVM TF-IDF
              precision    recall  f1-score   support

    Positive       0.85      0.85      0.85       530
    Negative       0.83      0.83      0.83       470

    accuracy                           0.84      1000
   macro avg       0.84      0.84      0.84      1000
weighted avg       0.84      0.84      0.84      1000


LR BOW
              precision    recall  f1-score   support

    Positive       0.85      0.83      0.84       530
    Negative       0.81      0.84      0.83       470

    accuracy                           0.83      1000
   macro avg       0.83      0.83      0.83      1000
weighted avg       0.83      0.83      0.83      1000


LR TF-IDF
              precision    recall  f1-score   support

    Positive       0.89      0.81      0.85       530
    Negative       0.80      0.88      0.84       470

    accuracy                           0.84      1000
   macro avg       0.84      0.85      0.84      1000
weighted avg       0.85      0.84      0.84      1000


MNB BOW
              precision    recall  f1-score   support

    Positive       0.83      0.84      0.84       530
    Negative       0.82      0.81      0.82       470

    accuracy                           0.83      1000
   macro avg       0.83      0.83      0.83      1000
weighted avg       0.83      0.83      0.83      1000


MNB TF-IDF
              precision    recall  f1-score   support

    Positive       0.87      0.83      0.85       530
    Negative       0.82      0.86      0.84       470

    accuracy                           0.84      1000
   macro avg       0.84      0.84      0.84      1000
weighted avg       0.84      0.84      0.84      1000


RFC BOW
              precision    recall  f1-score   support

    Positive       0.84      0.77      0.80       530
    Negative       0.76      0.83      0.79       470

    accuracy                           0.80      1000
   macro avg       0.80      0.80      0.80      1000
weighted avg       0.80      0.80      0.80      1000


RFC TF-IDF
              precision    recall  f1-score   support

    Positive       0.83      0.79      0.81       530
    Negative       0.78      0.81      0.80       470

    accuracy                           0.80      1000
   macro avg       0.80      0.80      0.80      1000
weighted avg       0.80      0.80      0.80      1000

Basic Cleaning + Lemmitization

SVM BOW
              precision    recall  f1-score   support

    Positive       0.84      0.83      0.83       530
    Negative       0.81      0.82      0.82       470

    accuracy                           0.83      1000
   macro avg       0.83      0.83      0.83      1000
weighted avg       0.83      0.83      0.83      1000


SVM TF-IDF
              precision    recall  f1-score   support

    Positive       0.85      0.86      0.86       530
    Negative       0.84      0.83      0.84       470

    accuracy                           0.85      1000
   macro avg       0.85      0.85      0.85      1000
weighted avg       0.85      0.85      0.85      1000


LR BOW
              precision    recall  f1-score   support

    Positive       0.86      0.84      0.85       530
    Negative       0.82      0.84      0.83       470

    accuracy                           0.84      1000
   macro avg       0.84      0.84      0.84      1000
weighted avg       0.84      0.84      0.84      1000


LR TF-IDF
              precision    recall  f1-score   support

    Positive       0.88      0.81      0.84       530
    Negative       0.80      0.87      0.84       470

    accuracy                           0.84      1000
   macro avg       0.84      0.84      0.84      1000
weighted avg       0.84      0.84      0.84      1000


MNB BOW
              precision    recall  f1-score   support

    Positive       0.82      0.85      0.83       530
    Negative       0.82      0.80      0.81       470

    accuracy                           0.82      1000
   macro avg       0.82      0.82      0.82      1000
weighted avg       0.82      0.82      0.82      1000


MNB TF-IDF
              precision    recall  f1-score   support

    Positive       0.85      0.83      0.84       530
    Negative       0.81      0.84      0.82       470

    accuracy                           0.83      1000
   macro avg       0.83      0.83      0.83      1000
weighted avg       0.83      0.83      0.83      1000


RFC BOW
              precision    recall  f1-score   support

    Positive       0.84      0.78      0.81       530
    Negative       0.77      0.83      0.80       470

    accuracy                           0.80      1000
   macro avg       0.80      0.81      0.80      1000
weighted avg       0.81      0.80      0.80      1000


RFC TF-IDF
              precision    recall  f1-score   support

    Positive       0.84      0.81      0.82       530
    Negative       0.80      0.82      0.81       470

    accuracy                           0.82      1000
   macro avg       0.82      0.82      0.82      1000
weighted avg       0.82      0.82      0.82      1000

1

There are 1 best solutions below

2
Darren Cook On

I would assume the scores you get are as good as they will get using bag of words or tf-idf approaches.

For instance the sentiment doesn't change between "I hated every minute of this movie, the plot was going nowhere" and "I hate every minute of this movie, the plot is go nowhere".