R
RunPod2mo ago
key8962

NVLink support for H100 NVL

When I execute the nvidia-smi topo -m method on the H100 NVL * 2 pod, I can see the PIX topology between GPU0 and GPU1. Can I use NVLink connection to interconnect the H100 NVL GPUs? How does the PIX(PCIe bridge) performance differ from NVLink?
14 Replies
nerdylive
nerdylive2mo ago
Heres some result from an AI Certainly! Let’s explore the differences between NVLink and PCIe for interconnecting H100 NVL GPUs. NVLink: NVLink represents a significant leap forward in GPU interconnect bandwidth. It leverages the Socketed Multi-Chip Module (SXM) architecture, facilitating ultra-fast data exchange between GPUs. Bandwidth: NVIDIA’s latest H100 GPUs using NVLink can achieve astonishingly high interconnect speeds, up to 900 GB/s12. Applications Benefiting from NVLink: Large-scale deep learning and AI model training. High-performance computing simulations. Data-intensive scientific research. PCIe: PCIe (Peripheral Component Interconnect Express) is the traditional backbone for GPU interconnectivity in servers. Strengths: Flexibility: PCIe is versatile and compatible with a diverse range of server architectures. Broad Compatibility: It caters to various AI applications, especially where inter-GPU communication load is moderate. Bandwidth: While PCIe offers lower bandwidth compared to NVLink, it remains a cost-effective solution for scenarios that don’t require the high bandwidth provided by NVLink. Ideal Use Cases for PCIe: Inference applications and lightweight AI workloads. Small to medium-scale machine learning model training. General-purpose computing requiring GPU acceleration. Performance Comparison: NVLink shines in environments where maximizing GPU-to-GPU bandwidth is paramount, offering superior performance for HPC and extensive AI model training. PCIe appeals to applications with moderate bandwidth requirements, providing a flexible and economical solution without necessitating high-speed interconnectivity3. In summary, choose wisely based on your specific AI application needs. If you require maximum inter-GPU bandwidth, NVLink is the way to go. For more moderate requirements, PCIe offers flexibility and cost-effectiveness3. 🚀 Im not sure about how can you use the nvlink connection to interconnect it, but i think its already setup @flash-singh
key8962
key89622mo ago
@nerdylive thank you! when I use the H100 SXM5 GPU, the nvidia-smi topo -m command shows that the GPUs are being interconnected with NV# topology, indicating NVLink usage. This differs from the H100 NVL use cases, so it would be helpful to confirm if the H100 NVL pod uses NVLink or not!
nerdylive
nerdylive2mo ago
Yeah im not sure about the nvlink setup but H100 NVL is a gpu type that should be optimized for nvlink also you can ask this on website support
key8962
key89622mo ago
@nerdylive i got it, thank you!
flash-singh
flash-singh2mo ago
H100 NVL does using nvlink but its paired between 2 gpus, and our software isn't optimized to give you 2 that are paired, unless you ask for all 8x, its possible when you ask for 2 gpus, you might get the ones that are not paired, we plan to optimize this soon so they're always paired with NVLink
nerdylive
nerdylive2mo ago
ahh
flash-singh
flash-singh2mo ago
you can see high level differences here https://www.nvidia.com/en-us/data-center/h100/
NVIDIA
NVIDIA H100 Tensor Core GPU
A Massive Leap in Accelerated Compute.
No description
nerdylive
nerdylive2mo ago
Wow they are actually faster
flash-singh
flash-singh2mo ago
NVL is 2 gpus
nerdylive
nerdylive2mo ago
oh like 2xSXM?
flash-singh
flash-singh2mo ago
yep, its marketing gimmick, they're comparing single gpu to NVL which is 2 gpus
nerdylive
nerdylive2mo ago
ahh icic
key8962
key89622mo ago
Thank you so much, that's why the 2 H100 NVL were slower than 2 SXM. Hope to be optimized very soon! @flash-singh then can I get a NVLinked 2 H100 NVLs with trying several times theoretically?
flash-singh
flash-singh2mo ago
yes