R
RunPod4mo ago
ehp

CUDA driver initialization failed

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. (Serverless RTX4090) (FROM runpod/base:0.6.2-cuda11.8.0) ------- First, it was below error at random times/workers: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.64 GiB total capacity; 22.98 GiB already allocated; 4.81 MiB free; 23.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Then I added {"refresh_worker": True} and above error started to occur at random times/workers. It replaced errors.
3 Replies
yhlong00000
yhlong000004mo ago
It looks like you’re running into memory issues. Maybe try a more powerful GPU with more VRAM.
ehp
ehpOP4mo ago
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. Do you think this is also memory related? @yhlong00000
yhlong00000
yhlong000004mo ago
This sounds like the CUDA driver or the GPU configuration issue, will need more info to figure it out. Can you share the full docker file or template you use?
Want results from more Discord servers?
Add your server