GPU requires reset
Restarted and re-created the pod a couple times, getting the same error on container start. I assume it keeps grabbing the same bad node. I was able to start the container by switching to a different instance type.
2024-08-26T21:15:45Z error creating container: nvidia-smi: parsing output of line 5: failed to parse ([GPU requires reset]) into int: strconv.Atoi: parsing "": invalid syntax
Pod ID: 2hvpqmtrowunjp
3 Replies
If you have not done already I suggest you open a ticket on the RunPod website. There is a contact link.
I did. The automatic e-mail said it would take a few days, and to try Discord. I'm unblocked anyways by switching instance types. Just trying to be helpful for the next person to try that node.
This machine has some issue, we just unlisted, you might want to create a new pod