R
RunPod8mo ago
bunny

No CUDA GPU available after not using GPU for a while

Hi! I need some help regarding my GPU pod. My pod shows no cuda GPU available out of nowhere a lot of times and only gets fixed if I restart the pod. nvidia-smi output: Failed to initialize NVML: Unknown Error Its on secure cloud and on-demand. If anyone faced a similar issue, please help.
9 Replies
Rin Rivz
Rin Rivz8mo ago
Same problem, bro (
bunny
bunnyOP8mo ago
any idea why? i feel like they take gpu away if its in idle state but we are paying continuously.
Madiator2011
Madiator20118mo ago
Could you provide more info what is happening what template you use etc
bunny
bunnyOP8mo ago
Hi @Papa Madiator. I am using the following template: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 On-Demand - Secure Cloud Instance details: 1 x H100 80GB PCIe, 32 vCPU 188 GB RAM I don't think I have 0 gpus assigned. I haven't stopped my instance since I started it. It was working fine at start. Now, it shows no CUDA gpus available and fixes when I restart it. I have noticed it happens when I don't use GPU for a couple of hours (my instance being running all the time and I am being charged for it).
Madiator2011
Madiator20118mo ago
do you mind sharing pod id?
bunny
bunnyOP8mo ago
dmri8voyci6eh2
Madiator2011
Madiator20118mo ago
Do you mind sending me email connected to that runpod account on private message
bunny
bunnyOP8mo ago
sent
Want results from more Discord servers?
Add your server