My transformers pipeline does not use cuda.
code:
from transformers import pipeline, Conversation
# load_in_8bit: lower precision but saves a lot of GPU memory
# device_map=auto: loads the model across multiple GPUs
chatbot = pipeline("conversational", model="BramVanroy/GEITje-7B-ultra", model_kwargs={"load_in_8bit": True}, device_map="auto")
Testing for cuda works just fine:
import torch
print(torch.cuda.is_available())
Which prints True
I have a project with these libs:
[tool.poetry.dependencies]
python = "^3.11"
transformers = "^4.37.2"
torch = {version = "^2.2.0+cu121", source = "pytorch"}
torchvision = {version = "^0.17.0+cu121", source = "pytorch"}
accelerate = "^0.26.1"
bitsandbytes = "^0.42.0"
[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
priority = "supplemental"
What am I missing?
Ok I'll just post more info here, in case anyone else ends up here:
load_in_8bitso you can use all the memory and compute.