Pytorch GPU OOM with incorrect memory stats

394 Views Asked by At

I am running into a Cuda OOM error when trying to do torch.cat as part of a library that I am using (ultralytics). While this is a problem, my main issue I need help with is the fact that I am getting absurd/incorrect stats in use messages from the console. Specifically it is saying that I have

Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use

even though I have a 6gib of GPU memory and 16gb of normal ddr4 ram. I am currently running a Windows 10 device, where my code is running within a Ubuntu 22.04.3 LTS WSL 2 instance, and my wsl config is the following:

# Settings apply across all Linux distros running on WSL 2
[wsl2]

# Limits VM memory to use no more than 4 GB, this can be set as whole numbers using GB or MB
memory=12GB 

swap=8GB

My error message in it's entirety is:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 6.00 GiB of which 2.78 GiB is free. Including non-PyTorch memory, this process has 17179869184.00 GiB memory in use. Of the allocated memory 2.09 GiB is allocated by PyTorch, and 50.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I have tried looking at my processes list with tools like nvidia-smi but there were no other processes that were using my GPU.

0

There are 0 best solutions below