RunPod•14mo ago

Urgent: All new gpu pods are broken

Hi, our existing pods and new pods we are creating are having all same issue where they cannot find cuda devices, all giving error Warning: caught exception 'CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero.', memory monitor disabled

11 Replies

ErcanOP•14mo ago

@ashleyk Tagging you, you might know if something has changed on sd web ui repo or is there any issue on general right now our all Prod is down

ashleyk•14mo ago

You can use @Papa Madiator 's GPU analyser tool in #📚｜resources What GPU type and region? and is it secure cloud or community cloud?

ErcanOP•14mo ago

It is all gpus in community cloud. You can try one as well, it reproduces on all 4090 pods

ashleyk•14mo ago

I don't have issues with community cloud or 4090. Which template are you using?

ErcanOP•14mo ago

ashleyk•14mo ago

I don't use this template so can't help unfortunately, maybe @TheLastBen can advise why its unable to find the GPU.

ErcanOP•14mo ago

Just launched new pod on secure cloud, it seems they are working But community cloud have problems right now with either cuda version or something else

TheLastBen•14mo ago

just started a fresh 4090 pod, it works fine now

ashleyk•14mo ago

@xPaghkman next time it happens provide the pod id so RunPod can check it out and perhaps delist the machine if its broken.

ErcanOP•14mo ago

one of them is: g9tct1kr323mh0 stopped others. can't find the ids. Anyway just moved all our process to secure cloud. Have you tried to run cuda is_available ?

TheLastBen•14mo ago

I just generated an image

Gaming

Programming

Urgent: All new gpu pods are broken

Did you find this page helpful?