How to Load a Custom Layer Model Built on Hugging Face Transformers for NER Task

40 Views Asked by At

I've built a custom Named Entity Recognition (NER) model by adding a custom layer on top of a Hugging Face large language model (LLM). This custom model extends PreTrainedModel and integrates a classifier on top of the transformer's output. Here's the basic structure of my custom model class, BaseNERModel:

from transformers import PreTrainedModel, AutoConfig, AutoModel
from transformers.modeling_outputs import TokenClassifierOutput
import torch.nn as nn

class BaseNERModel(PreTrainedModel):
    def __init__(self, model_name, num_labels, id2label=None, label2id=None):
        config = AutoConfig.from_pretrained(
            model_name, output_attentions=True, output_hidden_states=True
        )
        super().__init__(config)

        self.num_labels = num_labels
        self.id2label = id2label
        self.label2id = label2id
        self.transformer = AutoModel.from_pretrained(model_name, config=config)
        self.dropout = nn.Dropout(config.hidden_dropout_prob)
        self.classifier = nn.Linear(config.hidden_size, num_labels)
        self.loss_fct = nn.CrossEntropyLoss(ignore_index=-100)

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        labels=None,
        return_dict=True,
        output_attentions=False,
        output_hidden_states=False,
    ):
        # Implementation...

I have successfully trained this model on my NER task using the Hugging Face Trainer API, saving the model at each epoch. The saved model directories look something like this:

cat bert-base-uncased-ner-german-custom/checkpoint-20055/
config.json              rng_state.pth            tokenizer.json           training_args.bin
model.safetensors        scheduler.pt             tokenizer_config.json    vocab.txt
optimizer.pt             special_tokens_map.json  trainer_state.json

However, I am encountering issues when attempting to load the trained model from these checkpoint directories for further use. My initial thought was to load the model as follows:

model = BaseNERModel.from_pretrained(
    "path_to_checkpoint",
    num_labels=len(id2label),
    id2label=id2label,
    label2id=label2id
).to(device)

Unfortunately, this hasn't worked, and I suspect I'm missing something about how to properly load models that have been extended with custom layers in this framework.

Question: How can I correctly load a trained instance of BaseNERModel from a saved checkpoint? Are there any specific steps or modifications required to make from_pretrained work with custom model architectures in the Hugging Face Transformers library?

0

There are 0 best solutions below