Transformer pipeline with 'accelerate' not using gpu?

249 Views Asked by At

My transformers pipeline does not use cuda.

code:

 from transformers import pipeline, Conversation

# load_in_8bit: lower precision but saves a lot of GPU memory
# device_map=auto: loads the model across multiple GPUs
chatbot = pipeline("conversational", model="BramVanroy/GEITje-7B-ultra", model_kwargs={"load_in_8bit": True}, device_map="auto")

Testing for cuda works just fine:

import torch

print(torch.cuda.is_available())

Which prints True

I have a project with these libs:

[tool.poetry.dependencies]
python = "^3.11"
transformers = "^4.37.2"
torch = {version = "^2.2.0+cu121", source = "pytorch"}
torchvision = {version = "^0.17.0+cu121", source = "pytorch"}
accelerate = "^0.26.1"
bitsandbytes = "^0.42.0"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
priority = "supplemental"

What am I missing?

2

There are 2 best solutions below

0
Rob Audenaerde On

Ok I'll just post more info here, in case anyone else ends up here:

  • The model needs loading form disk to GPU. This takes CPU and time
  • You need enough GPU VRAM.
  • On a bigger GPU take away the load_in_8bit so you can use all the memory and compute.
0
Yaoming Xuan On

All opensource models are loaded into cpu memory by default. You need to manually call pipe = pipe.to("cuda:0) or pipe = pipe.cuda() to run it on your GPU. You also need to transfer all your other tensors which take part in the calculation to the same GPU device if you want to finetune a model.