ehp
RRunPod
•Created by ehp on 7/31/2024 in #⚡|serverless
CUDA driver initialization failed
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu.
(Serverless RTX4090)
(FROM runpod/base:0.6.2-cuda11.8.0)
-------
First, it was below error at random times/workers:
CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.64 GiB total capacity; 22.98 GiB already allocated; 4.81 MiB free; 23.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Then I added
{"refresh_worker": True}
and above error started to occur at random times/workers. It replaced errors.
4 replies