Error while loading gemma model. (bitsandbytes)

266 Views Asked by At

I am trying to quantize the gemma model and save it on my local system. Here is the code: Tried uninstalling and reinstalling CUDA 11.8. Didnt work.

import os
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

os.environ["HUGGINGFACE_HUB_CACHE"] = r"E:\gemma\gemma-cache"
access_token = "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"
quant_config = BitsAndBytesConfig(load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b", use_auth_token=access_token)
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", quantization_config=quant_config, use_auth_token=access_token)

model.save_pretrained(r"E:\gemma\gemma-2b-quantized")
tokenizer.save_pretrained(r"E:\gemma\gemma-2b-quantized")

When I run this code I think the problem is with bitsandbytes and getting this error:

E:\Project\test\.venv\Scripts\python.exe E:\Project\test\save_model.py 
E:\Project\test\.venv\Lib\site-packages\transformers\models\auto\tokenization_auto.py:720: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
E:\Project\test\.venv\Lib\site-packages\transformers\models\auto\auto_factory.py:466: FutureWarning: The `use_auth_token` argument is deprecated and will be removed in v5 of Transformers. Please use `token` instead.
  warnings.warn(
`low_cpu_mem_usage` was None, now set to True since model is quantized.
False

===================================BUG REPORT===================================
E:\Project\test\.venv\Lib\site-packages\bitsandbytes\cuda_setup\main.py:167: UserWarning: Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes


  warn(msg)
================================================================================
The following directories listed in your path were found to be non-existent: {WindowsPath('AQAAANCMnd8BFdERjHoAwE/Cl+sBAAAA1Pyvkz1QTUmBL6tjYQCZ/gQAAAACAAAAAAAQZgAAAAEAACAAAACFzXyXPfw5dY7CS8Lzl5RoN0zRcNaG4js+u0VQ1s1bdAAAAAAOgAAAAAIAACAAAADZhVIb9jAEIP8jjAj4asYvQJmN3Ql+dmvUK7Z482joX2AAAAAWrQM1Dz3PUVV7e/OTmTOcqpdfl8ko1A9DHp2mvkmm3SB+K9jE7kaX4wn0YKF50YAFaZUH+JIBUlRzxeRB70862e9S4R6okHs0Qu9xMTSg9EAHPdVoDzqa0w5Jt6mGOClAAAAASn8EwxIg5ei0uf2fJj6HK72s1khHGzGNbs0DtLI/meugsnVFMXTK5EbhWdvU5EWbcmx4snfZJQBfPIQgqKaGSw==')}
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
DEBUG: Possible options found for libcudart.so: set()
CUDA SETUP: PyTorch settings found: CUDA_VERSION=118, Highest Compute Capability: 8.9.
CUDA SETUP: To manually override the PyTorch CUDA version please see:https://github.com/TimDettmers/bitsandbytes/blob/main/how_to_use_nonpytorch_cuda.md
CUDA SETUP: Loading binary E:\Project\test\.venv\Lib\site-packages\bitsandbytes\libbitsandbytes_cuda118.so...
[WinError 193] %1 is not a valid Win32 application
CUDA SETUP: Problem: The main issue seems to be that the main CUDA runtime library was not detected.
CUDA SETUP: Solution 1: To solve the issue the libcudart.so location needs to be added to the LD_LIBRARY_PATH variable
CUDA SETUP: Solution 1a): Find the cuda runtime library via: find / -name libcudart.so 2>/dev/null
CUDA SETUP: Solution 1b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_1a
CUDA SETUP: Solution 1c): For a permanent solution add the export from 1b into your .bashrc file, located at ~/.bashrc
CUDA SETUP: Solution 2: If no library was found in step 1a) you need to install CUDA.
CUDA SETUP: Solution 2a): Download CUDA install script: wget https://raw.githubusercontent.com/TimDettmers/bitsandbytes/main/cuda_install.sh
CUDA SETUP: Solution 2b): Install desired CUDA version to desired location. The syntax is bash cuda_install.sh CUDA_VERSION PATH_TO_INSTALL_INTO.
CUDA SETUP: Solution 2b): For example, "bash cuda_install.sh 113 ~/local/" will download CUDA 11.3 and install into the folder ~/local
Traceback (most recent call last):
  File "E:\Project\test\.venv\Lib\site-packages\transformers\utils\import_utils.py", line 1390, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\C files\Lib\importlib\__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 995, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "E:\Project\test\.venv\Lib\site-packages\transformers\integrations\bitsandbytes.py", line 11, in <module>
    import bitsandbytes as bnb
  File "E:\Project\test\.venv\Lib\site-packages\bitsandbytes\__init__.py", line 6, in <module>
    from . import cuda_setup, utils, research
  File "E:\Project\test\.venv\Lib\site-packages\bitsandbytes\research\__init__.py", line 1, in <module>
    from . import nn
  File "E:\Project\test\.venv\Lib\site-packages\bitsandbytes\research\nn\__init__.py", line 1, in <module>
    from .modules import LinearFP8Mixed, LinearFP8Global
  File "E:\Project\test\.venv\Lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in <module>
    from bitsandbytes.optim import GlobalOptimManager
  File "E:\Project\test\.venv\Lib\site-packages\bitsandbytes\optim\__init__.py", line 6, in <module>
    from bitsandbytes.cextension import COMPILED_WITH_CUDA
  File "E:\Project\test\.venv\Lib\site-packages\bitsandbytes\cextension.py", line 20, in <module>
    raise RuntimeError('''
RuntimeError: 
        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "E:\Project\test\save_model.py", line 8, in <module>
    model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", quantization_config=quant_config, use_auth_token=access_token)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\test\.venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\test\.venv\Lib\site-packages\transformers\modeling_utils.py", line 3389, in from_pretrained
    hf_quantizer.preprocess_model(
  File "E:\Project\test\.venv\Lib\site-packages\transformers\quantizers\base.py", line 166, in preprocess_model
    return self._process_model_before_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\test\.venv\Lib\site-packages\transformers\quantizers\quantizer_bnb_4bit.py", line 256, in _process_model_before_weight_loading
    from ..integrations import get_keys_to_not_convert, replace_with_bnb_linear
  File "<frozen importlib._bootstrap>", line 1412, in _handle_fromlist
  File "E:\Project\test\.venv\Lib\site-packages\transformers\utils\import_utils.py", line 1380, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\Project\test\.venv\Lib\site-packages\transformers\utils\import_utils.py", line 1392, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.integrations.bitsandbytes because of the following error (look up to see its traceback):

        CUDA Setup failed despite GPU being available. Please run the following command to get more information:

        python -m bitsandbytes

        Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
        to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
        and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues

Process finished with exit code 1

Dont read: Wont let me post so blabbering Wont let me post so blabbering

0

There are 0 best solutions below