jon691
jon691
RRunPod
Created by jon691 on 1/16/2024 in #⛅|pods
8x H100 SXM5, Error 802
I'm getting an "Error 802: system not yet initialized" on an 8x H100 SXM5 community pod. Running nv-fabricmanager gives this error: # /usr/bin/nv-fabricmanager -c ~/nvswitch/fabricmanager.cfg request to query NVSwitch device information from NVSwitch driver failed with error:Failed to load the requested module [NV_ERR_MODULE_LOAD_FAILED] From nvidia-smi: Fabric State : Completed Status : Success My workload runs smoothly on the 8x H100 PCIe pod.
3 replies