P2P is disabled between NVLINK connected GPUs 1 and 0
Hey team! Could you fix NVLink issue for H100 SXM Community pods? I encounter this error frequently. Corrupted pod ID: 4a5acwxj2kene6
P2P is disabled between NVLINK connected GPUs 1 and 0. This should not be the case given their connectivity, and is probably due to a hardware issue. If you still want to proceed, you can set NCCL_IGNORE_DISABLED_P2P=1.
I can proceed with NCCL_IGNORE_DISABLED_P2P flag but this will drop performance ~ 10%
Solution:Jump to solution
@storuky2306 so got response and aparently gpu5 is not supporting P2P.
What we can advise for now is to pick diffrent machine...
3 Replies
forwarded it to team
Solution
@storuky2306 so got response and aparently gpu5 is not supporting P2P.
What we can advise for now is to pick diffrent machine
@Papa Madiator ok, thanks