I'm trying to fine-tune llama2-13b-chat-hf with an open source datasets.
I always used this template but now I'm getting this error:
ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes
I installed all the packages required and these are the versions:
accelerate @ git+https://github.com/huggingface/accelerate.git@97d2168e5953fe7373a06c69c02c5a00a84d5344
bitsandbytes==0.42.0
datasets==2.17.1
huggingface-hub==0.20.3
peft==0.8.2
tokenizers==0.13.3
torch==2.1.0+cu118
torchaudio==2.1.0+cu118
torchvision==0.16.0+cu118
transformers==4.30.0
trl==0.7.11
Anyone know if that is a version issues? How did you fix that?
I tried to install other version but nothing.
Have you tried
accelerate testin your cmd terminal? If your installation is successful, this command should output a list of messages and a "test successful" in the end. If this command fails, it means that there is something wrong with your pytorch + accelerate environment. You should reinstall them following the official tutorials. If the command succeeds and you still can't do multi-GPU finetuning, you should report this issue in bitsandbytes' github repo.Here are some other potential causes.
Your cuda version is too old. Most tools are built on 12.0 + nowadays. Yous should update cuda with this link
python version should be 3.10 +, otherwise you won't be able to install the latest tools with pip
Why do you want to train a quantized model? Quantization is made to shrink the model for deployment instead of training. This tool is not designed for your purpose. If you finetune your model with quantized parameters, then gradients won't have any impact, because they are simply too small to represent with only 8 bits. If you want to finetune a LLM with limited GPU memory, you should try lora or SFT. Both of them can freeze some layers to reduce VRAM usage.