Repo id must use alphanumeric chars : while performing auto training on llm

36 Views Asked by At

I am trying to auto train an LLM model, but getting the below error :

** Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'data/'.**

I checked their community and even there a few users are facing the similar issue with no resolution to this.

Here is a snippet of my code :

push_to_hub = False
hf_token = "hf_@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
project_name = 'mmy-llm'
model_name = 'meta-llama/Llama-2-7b'

learning_rate = 2e-4
num_epochs = 4
batch_size = 1
block_size = 1024
trainer = "sft"
warmup_ratio = 0.1
weight_decay = 0.01
gradient_accumulation = 4
use_peft = True
use_int4 = True
lora_r = 16
lora_alpha = 32
lora_dropout = 0.045
quantization = "int4"

import os
os.environ["PROJECT_NAME"] = project_name
os.environ["MODEL_NAME"] = model_name
os.environ["PUSH_TO_HUB"] = str(push_to_hub)
os.environ["HF_TOKEN"] = hf_token
os.environ["REPO_ID"] = repo_id
os.environ["LEARNING_RATE"] = str(learning_rate)
os.environ["NUM_EPOCHS"] = str(num_epochs)
os.environ["BATCH_SIZE"] = str(batch_size)
os.environ["BLOCK_SIZE"] = str(block_size)
os.environ["WARMUP_RATIO"] = str(warmup_ratio)
os.environ["WEIGHT_DECAY"] = str(weight_decay)
os.environ["GRADIENT_ACCUMULATION"] = str(gradient_accumulation)
os.environ["USE_PEFT"] = str(use_peft)
os.environ["USE_INT4"] = str(use_int4)
os.environ["LORA_R"] = str(lora_r)
os.environ["LORA_ALPHA"] = str(lora_alpha)
os.environ["LORA_DROPOUT"] = str(lora_dropout)

!autotrain llm \
--train \
--model ${MODEL_NAME} \
--project-name ${PROJECT_NAME} \
--data-path data/ \
--text-column text \
--lr ${LEARNING_RATE} \
--batch-size ${BATCH_SIZE} \
--epochs ${NUM_EPOCHS} \
--block-size ${BLOCK_SIZE} \
--warmup-ratio ${WARMUP_RATIO} \
--lora-r ${LORA_R} \
--lora-alpha ${LORA_ALPHA} \
--lora-dropout ${LORA_DROPOUT} \
--weight-decay ${WEIGHT_DECAY} \
--gradient-accumulation ${GRADIENT_ACCUMULATION} \
--use-peft

The detailed error is :

`ERROR | 2024-03-22 15:55:05 | autotrain.trainers.common:wrapper:93 - train has failed due to an exception: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/common.py", line 90, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/clm/main.py", line 128, in train train_data, valid_data = process_input_data(config) File "/usr/local/lib/python3.10/dist-packages/autotrain/trainers/clm/main.py", line 71, in process_input_data train_data = load_dataset( File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 2129, in load_dataset builder_instance = load_dataset_builder( File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1815, in load_dataset_builder dataset_module = dataset_module_factory( File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1512, in dataset_module_factory raise e1 from None File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1479, in dataset_module_factory raise e File "/usr/local/lib/python3.10/dist-packages/datasets/load.py", line 1453, in dataset_module_factory dataset_info = hf_api.dataset_info( File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn validate_repo_id(arg_value) File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id raise HFValidationError( huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'data/'.

❌ ERROR | 2024-03-22 15:55:05 | autotrain.trainers.common:wrapper:94 - Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'data/'.`

I tried looking for solutions to this issue to all possible communities and articles. I have found that this issue is common amongst all the users performing auto training on Colab but no specific solutions could be found to this issue.

0

There are 0 best solutions below