When i run the following set up locally the model will go to train (till i run out of memory) but running the identical code gives me the following:
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`text` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
Can anyone see any issues in my code that would cause this? I'm a little baffled that this seems to run locally but not in the cloud. Here is my code:
from transformers import TrainingArguments
from trl import SFTTrainer
response_template = "[/INST]"
tokenizer.padding_side = 'right'
collator = DataCollatorForCompletionOnlyLM(response_template=response_template, tokenizer=tokenizer, mlm=False)
gradient_accumulation_steps = 2
max_grad_norm = 0.3
args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=4,
logging_dir='./logs',
logging_steps=10,
eval_steps=10,
do_eval=True,
optim="adamw_torch_fused",
gradient_accumulation_steps=gradient_accumulation_steps,
evaluation_strategy="steps",
max_grad_norm=max_grad_norm,
lr_scheduler_type="constant",
bf16=True,
remove_unused_columns=False,
report_to=['wandb'],
save_strategy='no',
learning_rate=0.0002,
hub_model_id="LewisThorpe/text-to-sql-northwind"
)
trainer = SFTTrainer(
model,
args=args,
train_dataset=data['train'],
eval_dataset=data['validation'],
dataset_text_field='text',
data_collator=collator,
packing=False,
max_seq_length=1024
)
trainer.train()
i've tried messing with the remove_unused_columns training arg but the gives a an index error when true.
Any guidance on this would be appreciated.