Cuda Error while trying to run oobabooga textgen as an api on kaggle notebooks

431 Views Asked by At

Beginner here trying to give Autogen a shot! I keep getting an error about cuda version being too old when i try to install oobabooga textgen web ui on kaggle notebook. the script works on google colab.

import torch
from pathlib import Path

if Path.cwd().name != 'text-generation-webui':
  print("Installing the webui...")

  !git clone https://github.com/oobabooga/text-generation-webui
  %cd text-generation-webui

  torver = torch.__version__
  print(f"TORCH: {torver}")
  is_cuda118 = '+cu118' in torver  # 2.1.0+cu118
  is_cuda117 = '+cu117' in torver  # 2.0.1+cu117

  textgen_requirements = open('requirements.txt').read().splitlines()
  if is_cuda117:
      textgen_requirements = [req.replace('+cu121', '+cu117').replace('+cu122', '+cu117').replace('torch2.1', 'torch2.0') for req in textgen_requirements]
  elif is_cuda118:
      textgen_requirements = [req.replace('+cu121', '+cu118').replace('+cu122', '+cu118') for req in textgen_requirements]
  with open('temp_requirements.txt', 'w') as file:
      file.write('\n'.join(textgen_requirements))

  !pip install -r requirements.txt --upgrade
  !pip install -r extensions/openai/requirements.txt --upgrade
  !pip install -r temp_requirements.txt --upgrade

  print("\033[1;32;1m\n --> If you see a warning about \"previously imported packages\", just ignore it.\033[0;37;0m")
  print("\033[1;32;1m\n --> There is no need to restart the runtime.\n\033[0;37;0m")

  try:
    import flash_attn
  except:
    !pip uninstall -y flash_attn

# Parameters
model_url ="typeof/dolphin-2.2.1-mistral-7b-sharded"
branch = ""
command_line_flags = "--n-gpu-layers 128 --load-in-4bit --use_double_quant"
api = False

if api:
  for param in ['--api', '--public-api']:
    if param not in command_line_flags:
      command_line_flags += f" {param}"

model_url = model_url.strip()
if model_url != "":
    if not model_url.startswith('http'):
        model_url = 'https://huggingface.co/' + model_url

    # Download the model
    url_parts = model_url.strip('/').strip().split('/')
    output_folder = f"{url_parts[-2]}_{url_parts[-1]}"
    branch = branch.strip('"\' ')
    if branch.strip() != '':
        output_folder += f"_{branch}"
        !python download-model.py {model_url} --branch {branch}
    else:
        !python download-model.py {model_url}
else:
    output_folder = ""

# Start the web UI
cmd = f"python server.py --extensions openai --share --public-api"
if output_folder != "":
    cmd += f" --model {output_folder}"
cmd += f" {command_line_flags}"
print(cmd)
!$cmd

The python version on both environments is 3.10, the only difference i could find was that cuda version on Kaggle is 11 and on google colab is 12.

does anyone have any idea whats going on and is there anyway to fix this issue and get the server to run on kaggle notebooks too.

on kaggle i keep getting this warning:

CUDA initialization: The NVIDIA driver on your system is too old (found version 11040). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

then the app doesn't use the GPUs. the notebook keeps running out of memory, the same script works on google colab and kaggle notebooks have twice the ram.

FYI: this serves as an openai API alternative for using open source LLMs with autogen

1

There are 1 best solutions below

0
user2287768 On

the issue was that pytorch version wasn't compatible with cuda version