Failed to initialize NVML: Unknown Error

(compress) root@1908bfec7b85:/workspace# nvidia-smi
Failed to initialize NVML: Unknown Error
Failed to initialize NVML: Unknown Error
Every hour or so on my runpod instance, I get the above nvidia error. I'm not changing anything with the machine -- I have to restart it to fix it. Any ideas? Thanks
3 Replies
nerdylive
nerdylive5mo ago
Stack Overflow
Failed to initialize NVML: Unknown Error in Docker after Few hours
I am having interesting and weird issue. When I start docker container with gpu it works fine and I see all the gpus in docker. However, few hours or few days later, I can't use gpus in docker. Whe...
yhlong00000
yhlong000005mo ago
looks like something need to be changed for the host server, might want to create a support ticket with pod id and attach the link nerdylive just post, hope they can fix it.
nerdylive
nerdylive5mo ago
yup but i guess no chance runpod would allow "higher previledged containers", so i guess it might be worth a try to contact support but not sure it would be allowed/ to work
Want results from more Discord servers?
Add your server