Highly imbalanced Alzheimer's Disease MRI image dataset

52 Views Asked by At

I am currently doing my final year project and I need your humble opinion. My dataset consists of 4 classes which contain :

Mild demented - 896 images

Moderate demented - 64 images

Non demented - 3200 images

Very Mild demented - 2240 images

As you can see, my moderate demented and mild demented are considered highly imbalance. Therefore, I am currently exploring the things that I should do when it comes to imbalance data. I am considering data augmentation or SMOTE to increase my imbalance data. However, I found that data augmentation should be done for training set only. In my case, I want to rebalance my data before splitting the data to ensure the data are balanced. What should I do? Can anyone help me?

I have tried data augmentation after data splitting on training set only. However, my supervisor advise maybe I should use SMOTE for oversampling the images.

1

There are 1 best solutions below

0
Chih-Hao Liu On

The issue of data imbalance frequently arises in "long-tail learning," which focuses on addressing datasets with a long-tail distribution.

enter image description here

There are several methods available to handle the data imbalance problem. The simplest and most effective approach is to use cost-sensitive learning, which balances the class importance weight based on the number of data.

For instance, your dataset contains a total of 6,400 data, and the number of data for the "Moderate demented" class is 64, then the class importance weight for "Moderate demented" is calculated as 6,400/64 = 100. On the other hand, the number of data for the "Non-demented" class is 3,200, its class importance weight is calculated as 6,400/3,200 = 2.

Reference: https://samer-baslan.medium.com/an-introduction-to-deep-long-tailed-learning-414881a2519