Mi300x NCCL Issue
I’m experiencing an issue with the MI300X pod. Two GPUs are configured, but I’m unable to run the basic all_reduce_perf test on the pod.
5 Replies
@GK
Escalated To Zendesk
The thread has been escalated to Zendesk!
Ticket ID: #11,678
Did you have the correct drivers?
I’m new to using RunPod and have been working with the RunPod Pytorch 2.4.0 ROCm 6.1 template. It was functioning properly until last Friday. However, the same template no longer works now. I haven’t installed any additional drivers beyond what comes with the template.
Maybe check with the ticket then
Send your pod Id to them too
okay