Hugging Face SFTTrainer not working with index error in cloud env but works in local

34 Views Asked by At

When i run the following set up locally the model will go to train (till i run out of memory) but running the identical code gives me the following:

ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`text` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

Can anyone see any issues in my code that would cause this? I'm a little baffled that this seems to run locally but not in the cloud. Here is my code:

from transformers import TrainingArguments
from trl import SFTTrainer

response_template = "[/INST]"
tokenizer.padding_side = 'right'
collator = DataCollatorForCompletionOnlyLM(response_template=response_template, tokenizer=tokenizer, mlm=False)
gradient_accumulation_steps = 2
max_grad_norm = 0.3


args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=4,
    logging_dir='./logs',
    logging_steps=10,
    eval_steps=10,
    do_eval=True,
    optim="adamw_torch_fused",
    gradient_accumulation_steps=gradient_accumulation_steps,
    evaluation_strategy="steps",
    max_grad_norm=max_grad_norm,
    lr_scheduler_type="constant",
    bf16=True,
    remove_unused_columns=False,
    report_to=['wandb'],
    save_strategy='no',
    learning_rate=0.0002,
    hub_model_id="LewisThorpe/text-to-sql-northwind"
)

trainer = SFTTrainer(
    model,
    args=args,
    train_dataset=data['train'],
    eval_dataset=data['validation'],
    dataset_text_field='text',
    data_collator=collator,
    packing=False,
    max_seq_length=1024
)

trainer.train()

i've tried messing with the remove_unused_columns training arg but the gives a an index error when true.

Any guidance on this would be appreciated.

0

There are 0 best solutions below