R
RunPod3w ago
feesta

Cuda not connecting to image provisioned for GPU

Started a community pod with 1 GPU (4090) using the Runpod pytorch image/template (runpod/pytorch:2.4.0-py3.11-cuda12.4). Immediately after starting pod, GPU is unavailable even though nvidia-smi seems to see the GPU. This is happening about 20% of the time I start images with this official container. No errors thrown in system or container logs. root@5c367a0d4ea2:/# python -c "import torch; print(torch.cuda.is_available())" /usr/local/lib/python3.11/dist-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False root@5c367a0d4ea2:/# nvidia-smi Mon Mar 24 15:59:01 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.127.05 Driver Version: 550.127.05 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off | | 0% 26C P8 11W / 450W | 2MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ (abridged due to message length)
2 Replies
feesta
feestaOP2w ago
Another pod. Immediately after starting the pod the GPU is not available even though it is set to 1 4090 GPU. ssh [email protected] -i ~/.ssh/ided25519 -- RUNPOD.IO -- Enjoy your Pod #93qymj5jda8e60 ^^ __ __ (__ \ ( \ | | ) ) _) ) | | | __ / | | | || \ | // \ / | | | \ \ | || | | | | || |( (| | || |_||/ || |||| _/ _| For detailed documentation and guides, please visit: https://docs.runpod.io/ and https://blog.runpod.io/ root@773fb48759c7:/# python -c "import torch; print(torch.cuda.is_available())" /usr/local/lib/python3.11/dist-packages/torch/cuda/init.py:128: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False
Hello from RunPod Documentation | RunPod Documentation
RunPod enables you to run your workloads on GPUs in the Cloud
RunPod Blog
RunPod Blog
The latest in Machine Learning and Artificial Intelligence
Jason
Jason2w ago
nvcc --version can you try running that? which template are you using? which one specifically

Did you find this page helpful?