Failed to initialize NVML: Unknown Error
(compress) root@1908bfec7b85:/workspace# nvidia-smi
Every hour or so on my runpod instance, I get the above nvidia error.
I'm not changing anything with the machine -- I have to restart it to fix it. Any ideas?
3 Replies
Stack Overflow
Failed to initialize NVML: Unknown Error in Docker after Few hours
I am having interesting and weird issue.
When I start docker container with gpu it works fine and I see all the gpus in docker. However, few hours or few days later, I can't use gpus in docker.
looks like something need to be changed for the host server, might want to create a support ticket with pod id and attach the link nerdylive just post, hope they can fix it.
yup but i guess no chance runpod would allow "higher previledged containers", so i guess it might be worth a try to contact support but not sure it would be allowed/ to work