CUDA driver initialization failed
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.
(Serverless RTX4090)
(FROM runpod/base:0.6.2-cuda11.8.0)
-------
First, it was below error at random times/workers:
CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.64 GiB total capacity; 22.98 GiB already allocated; 4.81 MiB free; 23.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Then I added
{"refresh_worker": True}
and above error started to occur at random times/workers. It replaced errors.
3 Replies
It looks like you’re running into memory issues. Maybe try a more powerful GPU with more VRAM.
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.
Do you think this is also memory related? @yhlong00000
This sounds like the CUDA driver or the GPU configuration issue, will need more info to figure it out. Can you share the full docker file or template you use?