torch.cuda.OutOfMemoryError: CUDA out of memory. But plenty unused VRAM is available

81 Views Asked by At

I am currently using a system of 32gb ddr5 ram,i5 13th gen cpu with nvidia rtx 4050 6 gb gpu. I have successfully installed cuda, pytorch and transformers in my virtual environment. I have successfully downloaded the model into model directory.

I am running the following code for testing purposes:

import torch
import torch.nn as nn
from transformers import AutoConfig, AutoTokenizer, AutoModelForCausalLM
torch.cuda.empty_cache()
# Load pre-trained model and tokenizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)
model_name = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_name,cache_dir = "./model")
model.to(device)
tokenizer = AutoTokenizer.from_pretrained(model_name,cache_dir = "./model")
# Move model to appropriate device

def get_model_response(input_text):
    # Tokenize the input text
    input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

    # Generate response from the model
    output = model.generate(input_ids, num_return_sequences=1)

    # Decode the generated response
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text

# Example usage:
user_input = "What is the capital of India ?"
response = get_model_response(user_input)
print("Model:", response)

I am getting this error if i am using model.to(device) which where device is cuda that is successfully printed. Error : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 172.00 MiB. GPU 0 has a total capacity of 6.00 GiB of which 0 bytes is free. Of the allocated memory 20.45 GiB is allocated by PyTorch, and 13.17 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

If i am omitting model.to(device) then i am getting this error : RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

I am totally confused. I have spent the whole day but still not able to do so,

Please help me, It is really important for me to understand why i am getting this error after having this powerful build.

0

There are 0 best solutions below