R
RunPod6mo ago
jon691

8x H100 SXM5, Error 802

I'm getting an "Error 802: system not yet initialized" on an 8x H100 SXM5 community pod. Running nv-fabricmanager gives this error: # /usr/bin/nv-fabricmanager -c ~/nvswitch/fabricmanager.cfg request to query NVSwitch device information from NVSwitch driver failed with error:Failed to load the requested module [NV_ERR_MODULE_LOAD_FAILED] From nvidia-smi: Fabric State : Completed Status : Success My workload runs smoothly on the 8x H100 PCIe pod.
1 Reply
jon691
jon6916mo ago
Maybe a misconfigured pod. The error's gone on a new 8x H100 SXM5 instance.