R
RunPod•10mo ago
Ercan

Urgent: All new gpu pods are broken

Hi, our existing pods and new pods we are creating are having all same issue where they cannot find cuda devices, all giving error Warning: caught exception 'CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.', memory monitor disabled
11 Replies
Ercan
ErcanOP•10mo ago
@ashleyk Tagging you, you might know if something has changed on sd web ui repo or is there any issue on general right now our all Prod is down
ashleyk
ashleyk•10mo ago
You can use @Papa Madiator 's GPU analyser tool in #📚|resources What GPU type and region? and is it secure cloud or community cloud?
Ercan
ErcanOP•10mo ago
It is all gpus in community cloud. You can try one as well, it reproduces on all 4090 pods
ashleyk
ashleyk•10mo ago
I don't have issues with community cloud or 4090. Which template are you using?
Ercan
ErcanOP•10mo ago
No description
ashleyk
ashleyk•10mo ago
I don't use this template so can't help unfortunately, maybe @TheLastBen can advise why its unable to find the GPU.
Ercan
ErcanOP•10mo ago
Just launched new pod on secure cloud, it seems they are working But community cloud have problems right now with either cuda version or something else
TheLastBen
TheLastBen•10mo ago
just started a fresh 4090 pod, it works fine now
ashleyk
ashleyk•10mo ago
@xPaghkman next time it happens provide the pod id so RunPod can check it out and perhaps delist the machine if its broken.
Ercan
ErcanOP•10mo ago
one of them is: g9tct1kr323mh0 stopped others. can't find the ids. Anyway just moved all our process to secure cloud. Have you tried to run cuda is_available ?
TheLastBen
TheLastBen•10mo ago
I just generated an image
Want results from more Discord servers?
Add your server