No CUDA GPU available after not using GPU for a while
Hi! I need some help regarding my GPU pod. My pod shows no cuda GPU available out of nowhere a lot of times and only gets fixed if I restart the pod.
nvidia-smi output:
Failed to initialize NVML: Unknown Error
Its on secure cloud and on-demand.
If anyone faced a similar issue, please help.
9 Replies
Same problem, bro (
any idea why? i feel like they take gpu away if its in idle state but we are paying continuously.
FAQ | RunPod Documentation
General questions about RunPod and its services.
Could you provide more info what is happening what template you use etc
Hi @Papa Madiator.
I am using the following template:
runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04
On-Demand - Secure Cloud
Instance details:
1 x H100 80GB PCIe, 32 vCPU 188 GB RAM
I don't think I have 0 gpus assigned. I haven't stopped my instance since I started it. It was working fine at start. Now, it shows no CUDA gpus available and fixes when I restart it. I have noticed it happens when I don't use GPU for a couple of hours (my instance being running all the time and I am being charged for it).
do you mind sharing pod id?
dmri8voyci6eh2
Do you mind sending me email connected to that runpod account on private message
sent