I've built a custom Named Entity Recognition (NER) model by adding a custom layer on top of a Hugging Face large language model (LLM). This custom model extends PreTrainedModel and integrates a classifier on top of the transformer's output. Here's the basic structure of my custom model class, BaseNERModel:
from transformers import PreTrainedModel, AutoConfig, AutoModel
from transformers.modeling_outputs import TokenClassifierOutput
import torch.nn as nn
class BaseNERModel(PreTrainedModel):
def __init__(self, model_name, num_labels, id2label=None, label2id=None):
config = AutoConfig.from_pretrained(
model_name, output_attentions=True, output_hidden_states=True
)
super().__init__(config)
self.num_labels = num_labels
self.id2label = id2label
self.label2id = label2id
self.transformer = AutoModel.from_pretrained(model_name, config=config)
self.dropout = nn.Dropout(config.hidden_dropout_prob)
self.classifier = nn.Linear(config.hidden_size, num_labels)
self.loss_fct = nn.CrossEntropyLoss(ignore_index=-100)
def forward(
self,
input_ids=None,
attention_mask=None,
token_type_ids=None,
labels=None,
return_dict=True,
output_attentions=False,
output_hidden_states=False,
):
# Implementation...
I have successfully trained this model on my NER task using the Hugging Face Trainer API, saving the model at each epoch. The saved model directories look something like this:
cat bert-base-uncased-ner-german-custom/checkpoint-20055/
config.json rng_state.pth tokenizer.json training_args.bin
model.safetensors scheduler.pt tokenizer_config.json vocab.txt
optimizer.pt special_tokens_map.json trainer_state.json
However, I am encountering issues when attempting to load the trained model from these checkpoint directories for further use. My initial thought was to load the model as follows:
model = BaseNERModel.from_pretrained(
"path_to_checkpoint",
num_labels=len(id2label),
id2label=id2label,
label2id=label2id
).to(device)
Unfortunately, this hasn't worked, and I suspect I'm missing something about how to properly load models that have been extended with custom layers in this framework.
Question: How can I correctly load a trained instance of BaseNERModel from a saved checkpoint? Are there any specific steps or modifications required to make from_pretrained work with custom model architectures in the Hugging Face Transformers library?