I am fine tuning Mistral7b model. I am getting long automated text generation using the fine tuned model. I have kept the eos_token=True. Can someone please tell me how to add a word limit to the responses?
I tried adding the max_length and truncation. It is still producing long text on it's own. I am expecting to get one response for one user query. However the model produces it's own follow-up user question and answers it on it's own. How do I keep the response short? Is it something related to loading tokenizers in a correct way?
base_model = "mistralai/Mistral-7B-v0.1"
bnb_config = BitsAndBytesConfig(
load_in_4bit= True,
bnb_4bit_quant_type= "nf4",
bnb_4bit_compute_dtype= torch.bfloat16,
bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
base_model,
load_in_4bit=True,
quantization_config=bnb_config,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()
# Load tokenizer
tokenizer=AutoTokenizer.from_pretrained(base_model,trust_remote_code=True)
tokenizer.padding_side = 'right'
tokenizer.pad_token = tokenizer.unk_token
tokenizer.add_eos_token = True
tokenizer.max_length = 200
tokenizer.truncation = True