RunPod•5mo ago

GPU memory already in use when pod starts

I have seen this happen multiple times across different GPU types and regions. When launching a pod some of the GPU memory is already in use and any attempt to make full use of the GPUs memory results in errors/crashes. For example, I have been trying to deploy 2xA100 GPUs in the Romania data center for the past hour. Each time I launch a pod one of the GPUs already shows 40% of the memory in use and attempting to utilize the GPU results in a crash. This is a screenshot of my GPU usage immediately after launching the pod, before any model had been loaded (or even downloaded). Restarting the pod and deleteing/recreating the pod does not resolve the issue. If I paying to rent a GPU I expect to be able to make full use of it and not have half of the memory be locked up for no apparent reason. Oh, and I tried running koboldcpp in the CA region which doesn't have this problem, but for some reason it is unable to create a cloudflare URL (only happens on CA region, have seen this for 2+ months now). Honestly I'm starting to become very frustrated with Runpod's service and am strongly considering moving to a different provider for my uses. I spend half my time (and money) troubleshooting these errors rather than actually using the services I'm paying for.

3 Replies

coffeewaffles.OP•5mo ago

Screenshot of trying to run koboldcpp in CA datacanter:

coffeewaffles.OP•5mo ago

running nvidia-smi shows no processes using memory, yet 32BG of the memory in use. This is ridiculous.

yhlong00000•5mo ago

Sorry for the inconvenience. I’ve sent a message to the internal team to take a look.

Gaming

Programming

GPU memory already in use when pod starts

Did you find this page helpful?