Transformer pipeline with 'accelerate' not using gpu?

249 Views Asked by Rob Audenaerde At 08 February 2024 at 13:20

My transformers pipeline does not use cuda.

code:

 from transformers import pipeline, Conversation

# load_in_8bit: lower precision but saves a lot of GPU memory
# device_map=auto: loads the model across multiple GPUs
chatbot = pipeline("conversational", model="BramVanroy/GEITje-7B-ultra", model_kwargs={"load_in_8bit": True}, device_map="auto")

Testing for cuda works just fine:

import torch

print(torch.cuda.is_available())

Which prints True

I have a project with these libs:

[tool.poetry.dependencies]
python = "^3.11"
transformers = "^4.37.2"
torch = {version = "^2.2.0+cu121", source = "pytorch"}
torchvision = {version = "^0.17.0+cu121", source = "pytorch"}
accelerate = "^0.26.1"
bitsandbytes = "^0.42.0"

[[tool.poetry.source]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu121"
priority = "supplemental"

What am I missing?

Original Q&A

There are 2 best solutions below

Rob Audenaerde On 13 February 2024 at 09:25

Ok I'll just post more info here, in case anyone else ends up here:

The model needs loading form disk to GPU. This takes CPU and time
You need enough GPU VRAM.
On a bigger GPU take away the load_in_8bit so you can use all the memory and compute.

Yaoming Xuan On 20 February 2024 at 08:03

All opensource models are loaded into cpu memory by default. You need to manually call pipe = pipe.to("cuda:0) or pipe = pipe.cuda() to run it on your GPU. You also need to transfer all your other tensors which take part in the calculation to the same GPU device if you want to finetune a model.

Transformer pipeline with 'accelerate' not using gpu?

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in ACCELERATE

Trending Questions

Popular # Hahtags

Popular Questions