I am using Mistral-7B to generate feedbacks given an accuracy input. After tokenization, the entries of the custom-made dataset look like:
{'text': "<s>[INST] Generate a feedback on the performance which takes into account the input accuracy. here are the inputs 0 [/INST] \\n You're starting to get familiar. Let's keep practicing! </s>",
'instruction': 'Generate a feedback on the performance which takes into account the input accuracy.',
'input': '0',
'output': "You're starting to get familiar, which is the first step. Let's keep practicing!"}
Accuracy can range from 0-100, grouped by interval of 10, meaning that each group of 10 inputs corresponds three different output sentences (i.e. "You're starting to get familiar. Let's keep practicing!") for a total of 300 entries.
I want to fine-tune the model, and I am aware 300 entries is small. Below the training loss:

As for the training code, I simply implemented the tutorials for the Mistral-7B finetuning:
model_name = "mistralai/Mistral-7B-Instruct-v0.2"
# Load the base model with QLoRA configuration
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True, # nested quantisation (additional quantisation)
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
base_model = AutoModelForCausalLM.from_pretrained(model_name,
quantization_config=bnb_config)
base_model.config.use_cache = False
base_model.config.pretraining_tp = 1
# Load MitsralAi tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
target_modules=[
"q_proj",
"k_proj",
"v_proj",
"o_proj",
"gate_proj",
"up_proj",
"down_proj",
"lm_head",
],
bias="none",
task_type="CAUSAL_LM",
modules_to_save=["score"]
)
base_model = get_peft_model(base_model, peft_config)
# Set training parameters
training_arguments = TrainingArguments(
output_dir="../gpublob/results",
report_to = "wandb",
per_device_train_batch_size=1,
per_device_eval_batch_size=1,
num_train_epochs=1,
optim="paged_adamw_32bit",
save_steps=1000,
logging_steps=25,
#eval_steps = STEPS_PER_EPOCH,
learning_rate=2e-4,
weight_decay=0.001,
fp16=False,
bf16=False,
max_grad_norm=0.3,
max_steps=100000, # the total number of training steps to perform
warmup_ratio=0.03,
group_by_length=True,
lr_scheduler_type="constant"
)
trainer = SFTTrainer(
model=base_model,
train_dataset=train_dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=None, # You can specify the maximum sequence length here
tokenizer=tokenizer,
args=training_arguments,
packing=False,
)
base_model.gradient_checkpointing_enable()
base_model = prepare_model_for_kbit_training(base_model)
# train
trainer.train()
Is my data structure correct?
Do you think agumenting the fine-tuning the dataset would be crucial?