VRAM stuck at 77% usage
VRAM usage stuck at 77% on 1 of my 4 GPUs. already restarted, hard stop, and start. and reset. i don't want to have to switch pods bc I have hundreds of GB of data on the volume that will take a long time to set up again. anything else i can do?
tried reset. still stuck.
ID: ox02c3pvm058j3
4 Replies
can you check nvidia smi
nvidia-smi --gpu-reset -i 0
doesn't fix that?or look for any rogue processes (e.g., python, torch, tensorflow) that might be keeping memory occupied. If you see any, find their process IDs and kill them manually.
what if the nvidia-smi doesnt show any process? like this one https://canary.discord.com/channels/912829806415085598/1309156937786593300/1309156937786593300
open a support ticket we can take a look.
At least for his pod, there’s nothing occupying VRAM before he starts, and after stopping the pod, everything is released as expected.