Getting Long text generation after fine tuning Mistral 7b Model

286 Views Asked by At

I am fine tuning Mistral7b model. I am getting long automated text generation using the fine tuned model. I have kept the eos_token=True. Can someone please tell me how to add a word limit to the responses?

I tried adding the max_length and truncation. It is still producing long text on it's own. I am expecting to get one response for one user query. However the model produces it's own follow-up user question and answers it on it's own. How do I keep the response short? Is it something related to loading tokenizers in a correct way?

base_model = "mistralai/Mistral-7B-v0.1"

 bnb_config = BitsAndBytesConfig(

    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
 )

 model = AutoModelForCausalLM.from_pretrained(

        base_model,
        load_in_4bit=True,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
 )

 model.config.use_cache = False

 model.config.pretraining_tp = 1

 model.gradient_checkpointing_enable()



 # Load tokenizer
 tokenizer=AutoTokenizer.from_pretrained(base_model,trust_remote_code=True)

 tokenizer.padding_side = 'right'

 tokenizer.pad_token = tokenizer.unk_token

 tokenizer.add_eos_token = True

 tokenizer.max_length = 200

 tokenizer.truncation = True
2

There are 2 best solutions below

1
user1516492 On
eval_tokenizer = AutoTokenizer.from_pretrained(model_id, add_bos_token=True, trust_remote_code=True)

eval_prompt = "your prompt here"
model_input = eval_tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(eval_tokenizer.decode(ft_model.generate(**model_input, max_new_tokens=1024, repetition_penalty=1.15)[0], skip_special_tokens=True))
1
Haider Asad On

having the same issue with mistral 7b instruct v0.2 using peft and lora, my suspicion are on the following:

  1. Padding side should be left, but after going through alot of articles i set it to right but still confused

  2. pad token should be unk token not eos token,(you did that but it seems it still overflows)

Here are the steps i followed for finetuning:

  1. Dataset was prepared using the format: <s>[INST] <my instruction+input> [/INST] <my preffered output> </s>, keeping in mind to keep the settings tokenizer.add_eos_token=False tokenizer.add_bos_token=False as we already have it inserted in the dataset

When i run the finetuned model it keeps on generating till the max_token_length

what should be the padding side while using the inference tokenizer

UPDATE: setting pad token to UNK token solved the problem with pad side as right