I am a newbie trying to learn fine tuning. Started with falcon 7B instruct LLM as my base LLM and want to fine tune this with open assistant instruct dataset. I have 2080 Ti with 11G VRAM. So I am using 4 bit quantization and Lora.
These are the experiments I did so far:
1> I trained with SFT trainer from hugging face for 25000 epochs, the loss decreased from 1.8 to 0.7. Below is the entire code I am using for training.
import torch, einops
from datasets import load_dataset
from peft import LoraConfig
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
AutoTokenizer,
TrainingArguments
)
from peft.tuners.lora import LoraLayer
from trl import SFTTrainer
def create_and_prepare_model():
compute_dtype = getattr(torch, "float16")
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=compute_dtype,
bnb_4bit_use_double_quant=True,
)
model = AutoModelForCausalLM.from_pretrained(
"tiiuae/falcon-7b-instruct", quantization_config=bnb_config, device_map={"": 0}, trust_remote_code=True
)
peft_config = LoraConfig(
lora_alpha=16,
lora_dropout=0.1,
r=64,
bias="none",
task_type="CAUSAL_LM",
target_modules=[
"query_key_value"
],
)
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b-instruct", trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
return model, peft_config, tokenizer
training_arguments = TrainingArguments(
output_dir="./results_falcon-7b-instruct-new",
per_device_train_batch_size=1,
gradient_accumulation_steps=10,
optim="paged_adamw_32bit",
save_steps=5,
logging_steps=10,
learning_rate=2e-4,
fp16=True,
max_grad_norm=0.3,
max_steps=20,
warmup_ratio=0.03,
# group_by_length=True,
lr_scheduler_type="constant",
)
model, peft_config, tokenizer = create_and_prepare_model()
model.config.use_cache = False
dataset = load_dataset("timdettmers/openassistant-guanaco", split="train")
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=peft_config,
dataset_text_field="text",
max_seq_length=512,
tokenizer=tokenizer,
args=training_arguments,
packing=True,
)
trainer.train()
trainer.save_model("falcon-instruct-7b-4bit-openassist-latest-new")
model.config.to_json_file("falcon-instruct-7b-4bit-openassist-latest-new/config.json")
took about 53 hours. But the model just spits out gibberish when asked for a simple question like "how are you?"
2> 300 epochs, loss went down from 1.8 to 1.5 but the model still spits out gibberish.
3> 40 epochs, loss went down from 1.8 to 1.7 but the model still spits out gibberish.
Any pointers that could give me a head start? Please suggest. Any open source code to do something similar will be greatly appreciated. Thanks a lot.
1) Match you prompts to the dataset
Is what you are entering into the generation prompt look like what is being fine-tuned with?
Generally the LLM will generate the desired output when you use the same format the fine-tuning dataset is formatted in. This formatting helps "steer" or "contextualize" the generation text.
The Alpaca datasets generally follow the following format:
Vicuna datasets generally follow the following format:
Another format formally described recently in the Microsoft Orca paper:
Be mindful of line breaks ( also end of text symbols if the LLM model has any in pretraining ) in your dataset and prompt. Vicuna inference prompt e.g.
If you are using transformers directly in python to perform inference you will have to add the "### Assistant:\n" line to the end of you prompt minding the how line breaks '\n' are handled in the dataset. LLM's are glorious auto-completes if not quite a stochastic parrots.
The Vicuna format excels for chatbot fine-tuning prompts. The Alpaca and Orca format are formats are useful for instruction following models trained to provide information in a specific format. This topic is evolving and in practice the nitty gritty of prompt engineering most users never see or think about. That all said these formats are not magic just one part of a solution to generating interpretable responses aligned with intents that merits strict attention.
2) When you have prompt and dataset formats all accounted for, go back to hyper-parameter optimization.
3) Text Generation after 20 Training Steps
The following example works though throws a warning as the Falcon model is not well integrated into huggingface transformers library.