RunPod•11mo ago

NVLink support for H100 NVL

When I execute the nvidia-smi topo -m method on the H100 NVL * 2 pod, I can see the PIX topology between GPU0 and GPU1. Can I use NVLink connection to interconnect the H100 NVL GPUs? How does the PIX(PCIe bridge) performance differ from NVLink?

14 Replies

Jason•11mo ago

Heres some result from an AI Certainly! Let’s explore the differences between NVLink and PCIe for interconnecting H100 NVL GPUs. NVLink: NVLink represents a significant leap forward in GPU interconnect bandwidth. It leverages the Socketed Multi-Chip Module (SXM) architecture, facilitating ultra-fast data exchange between GPUs. Bandwidth: NVIDIA’s latest H100 GPUs using NVLink can achieve astonishingly high interconnect speeds, up to 900 GB/s12. Applications Benefiting from NVLink: Large-scale deep learning and AI model training. High-performance computing simulations. Data-intensive scientific research. PCIe: PCIe (Peripheral Component Interconnect Express) is the traditional backbone for GPU interconnectivity in servers. Strengths: Flexibility: PCIe is versatile and compatible with a diverse range of server architectures. Broad Compatibility: It caters to various AI applications, especially where inter-GPU communication load is moderate. Bandwidth: While PCIe offers lower bandwidth compared to NVLink, it remains a cost-effective solution for scenarios that don’t require the high bandwidth provided by NVLink. Ideal Use Cases for PCIe: Inference applications and lightweight AI workloads. Small to medium-scale machine learning model training. General-purpose computing requiring GPU acceleration. Performance Comparison: NVLink shines in environments where maximizing GPU-to-GPU bandwidth is paramount, offering superior performance for HPC and extensive AI model training. PCIe appeals to applications with moderate bandwidth requirements, providing a flexible and economical solution without necessitating high-speed interconnectivity3. In summary, choose wisely based on your specific AI application needs. If you require maximum inter-GPU bandwidth, NVLink is the way to go. For more moderate requirements, PCIe offers flexibility and cost-effectiveness3. 🚀 Im not sure about how can you use the nvlink connection to interconnect it, but i think its already setup @flash-singh

key8962OP•11mo ago

@nerdylive thank you! when I use the H100 SXM5 GPU, the nvidia-smi topo -m command shows that the GPUs are being interconnected with NV# topology, indicating NVLink usage. This differs from the H100 NVL use cases, so it would be helpful to confirm if the H100 NVL pod uses NVLink or not!

Jason•11mo ago

Yeah im not sure about the nvlink setup but H100 NVL is a gpu type that should be optimized for nvlink also you can ask this on website support

key8962OP•11mo ago

@nerdylive i got it, thank you!

flash-singh•11mo ago

H100 NVL does using nvlink but its paired between 2 gpus, and our software isn't optimized to give you 2 that are paired, unless you ask for all 8x, its possible when you ask for 2 gpus, you might get the ones that are not paired, we plan to optimize this soon so they're always paired with NVLink

Jason•11mo ago

ahh

flash-singh•11mo ago

you can see high level differences here https://www.nvidia.com/en-us/data-center/h100/

NVIDIA

NVIDIA H100 Tensor Core GPU

A Massive Leap in Accelerated Compute.

Jason•11mo ago

Wow they are actually faster

flash-singh•11mo ago

NVL is 2 gpus

Jason•11mo ago

oh like 2xSXM?

flash-singh•11mo ago

yep, its marketing gimmick, they're comparing single gpu to NVL which is 2 gpus

Jason•11mo ago

ahh icic

key8962OP•11mo ago

Thank you so much, that's why the 2 H100 NVL were slower than 2 SXM. Hope to be optimized very soon! @flash-singh then can I get a NVLinked 2 H100 NVLs with trying several times theoretically?

flash-singh•11mo ago

yes

Gaming

Programming

NVLink support for H100 NVL

Did you find this page helpful?