ehp
ehp
RRunPod
Created by ehp on 7/31/2024 in #⚡|serverless
CUDA driver initialization failed
RuntimeError: CUDA driver initialization failed, you might not have a CUDA gpu. (Serverless RTX4090) (FROM runpod/base:0.6.2-cuda11.8.0) ------- First, it was below error at random times/workers: CUDA out of memory. Tried to allocate 512.00 MiB (GPU 0; 23.64 GiB total capacity; 22.98 GiB already allocated; 4.81 MiB free; 23.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Then I added {"refresh_worker": True} and above error started to occur at random times/workers. It replaced errors.
4 replies