elan
Pod system error
I've been running this pod for over 6 months and suddenly it's having issues. Although it says the pod is "running", the system logs show this error repeatedly:
2024-06-10T12:28:12Z start container
2024-06-10T12:28:14Z error starting container: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: device error: GPU-xxxxxxx: unknown device: unknown
2024-06-10T12:28:32Z start container
I've already submitted a ticket. This is extremely time sensitive and any help would be appreciated.
5 replies