Deploying LLM on Sagemaker Endpoint - CUDA out of Memory

50 Views Asked by akshat garg At 12 March 2024 at 21:05

I am trying to deploy LLM to Sagemaker Endpoint using custom scripts and I am facing an error : CUDA out of memory. Tried to allocate 20.00 MiB. GPU 1 has a total capacty of 22.20 GiB of which 13.12 MiB is free. Process 13234 has 2.25 GiB memory in use. Process 13238 has 3.82 GiB memory in use. Process 13236 has 8.06 GiB memory in use. Process 13239 has 8.06 GiB memory in use. Of the allocated memory 6.93 GiB is allocated by PyTorch, and 49.59 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF : 400

**The thing is I was able to get a response for a couple of pings and then I started getting this error. I am clearing the cache in my predict function but I cannot understand 3 things:

Why I am getting this error after couple of successful response from Endpoint?
What and Why are the four processes eating space?
And obvious, How can I resolve this error?**

Original Q&A

Deploying LLM on Sagemaker Endpoint - CUDA out of Memory

There are 0 best solutions below

Related Questions in GPU

Related Questions in AMAZON-SAGEMAKER

Related Questions in ENDPOINT

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in LLAMA

Trending Questions

Popular # Hahtags

Popular Questions