I'm now doing information extraction using NER. My dataset domain (mostly) in computer science. It contains label/tag: "TUJUAN", "METODE", and "TEMUAN". The problem is almost 80-90% data are labeled O which means it has no meaningful tag. The precision and recall from the model is 0, while the accuracy is about 0.78. I use IndoBERT as model for NER task.
I suspect this happens because my dataset is extremely unbalanced. At first, I want to modify the loss function based on BertForTokenClassification documentation to Dice Loss or Focal Loss as it mentioned here but I don't know how since my Python knowledge is still very weak.
class BertForTokenClassification(BertPreTrainedModel):
def __init__(self, config):
super().__init__(config)
self.num_labels = config.num_labels
self.bert = BertModel(config, add_pooling_layer=False)
classifier_dropout = (
config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
)
self.dropout = nn.Dropout(classifier_dropout)
self.classifier = nn.Linear(config.hidden_size, config.num_labels)
# Initialize weights and apply final processing
self.post_init()
@add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings(
checkpoint=_CHECKPOINT_FOR_TOKEN_CLASSIFICATION,
output_type=TokenClassifierOutput,
config_class=_CONFIG_FOR_DOC,
expected_output=_TOKEN_CLASS_EXPECTED_OUTPUT,
expected_loss=_TOKEN_CLASS_EXPECTED_LOSS,
)
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
token_type_ids: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
head_mask: Optional[torch.Tensor] = None,
inputs_embeds: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[Tuple[torch.Tensor], TokenClassifierOutput]:
r"""
labels (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.
"""
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
outputs = self.bert(
input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask,
inputs_embeds=inputs_embeds,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
sequence_output = outputs[0]
sequence_output = self.dropout(sequence_output)
logits = self.classifier(sequence_output)
loss = None
if labels is not None:
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
if not return_dict:
output = (logits,) + outputs[2:]
return ((loss,) + output) if loss is not None else output
return TokenClassifierOutput(
loss=loss,
logits=logits,
hidden_states=outputs.hidden_states,
attentions=outputs.attentions,
)
My full code is here
Can I get any help how to handle my imbalance dataset based on my problems?

Imbalance dataset in NER is highly multi-tasking sometimes, but there are several strategies you can employ to handle this issue.
You can handle class imbalance by assigning different weights to different classes in the Cross-Entropy Loss function.
Focal Loss is designed to address class imbalance by down-weighting well-classified examples. It focuses more on hard, misclassified examples. This can help in cases where the majority class dominates the loss calculation.
Dice Loss is another loss function commonly used for imbalanced datasets. It measures the overlap between predicted and target masks. This loss function tends to work well for tasks like segmentation but can also be adapted for NER.
You can actually modify your code to implement Weighted Cross-Entropy Loss, the similar code provided above should look like the followings using Python:
So, in carefully examine the above you can pass the class_weights parameter when you initialize your model. You can calculate class weights based on the frequency of each class in your dataset. Of course, it should looks like the following:
Adjust the values of class_weights according to the distribution of your classes. This approach will give more weight to minority classes, which can help in training with imbalanced datasets.
Check through and implement correctly, do not hesitate to ask more should there be any issue.
As a reference, you can also read more from this link:
A Step-by-Step Guide to handling imbalanced datasets in Python