RunPod•7mo ago

CUDA driver initialization failed

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. (Serverless RTX4090) (FROM runpod/base:0.6.2-cuda11.8.0) ------- First, it was below error at random times/workers: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.64 GiB total capacity; 22.98 GiB already allocated; 4.81 MiB free; 23.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Then I added {"refresh_worker": True} and above error started to occur at random times/workers. It replaced errors.

3 Replies

yhlong00000•7mo ago

It looks like you’re running into memory issues. Maybe try a more powerful GPU with more VRAM.

ehpOP•7mo ago

RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. Do you think this is also memory related? @yhlong00000

yhlong00000•7mo ago

This sounds like the CUDA driver or the GPU configuration issue, will need more info to figure it out. Can you share the full docker file or template you use?

Gaming

Programming

CUDA driver initialization failed

Did you find this page helpful?