jon691
8x H100 SXM5, Error 802
I'm getting an "Error 802: system not yet initialized" on an 8x H100 SXM5 community pod.
Running nv-fabricmanager gives this error:
# /usr/bin/nv-fabricmanager -c ~/nvswitch/fabricmanager.cfg
request to query NVSwitch device information from NVSwitch driver failed with error:Failed to load the requested module [NV_ERR_MODULE_LOAD_FAILED]
From nvidia-smi:
Fabric
State : Completed
Status : Success
My workload runs smoothly on the 8x H100 PCIe pod.3 replies