Cuda out of memory
Hello, I am using the Runpod PyTorch 2.1.
I am trying to train a small model (phi) about 1.5gb and whatever I do, I keep getting an error about Cuda out of memory from a process I don’t know where it comes from. I am using a 3090 gpu so I don’t understand where is the problem
5 Replies
you likely need more vram, whatever model your running takes up too much vram
But this phi model is 1.5 bg and now I tried a A40 and got the same problem.
Moreover, I don’t have any fluctuation of gpu utilization on the website
Yeah I also tried it on A100
something is wrong with code then
Autotrain ? I should open an issue ? It’s okay for me, I just want to be sure it doesn’t have a link with runpod/container since I am able to run it on my computer locally
I think considering that people like kopylk are able to run very large training sets, id be surprised, if there was an issue with runpod. maybe something about the code is constantly pushing to vram, without management.
But ive used up a large amount of memories for image generations before and LLMs, where ive deifnitely runned out, but bumping up to something like a A100, I haven't had an issue with