Mi300x HIP error: no ROCm-capable device is detected
I'm using the Mi300x and getting a
RuntimeError: HIP error: no ROCm-capable device is detected
using RunPod Pytorch 2.4.0 ROCm 6.1 template, how can I resolve this?1 Reply
I'm having the same issue, but it only happened recently (~30m ago). Seems to be caused after a GPU hang, restarting the pod doesn't work. When this happens to our own nodes its usually fixed automatically