RRunPod•Created by bwa on 12/4/2024 in #⛅|pods Faulty node?
Since this morning, I encountered this error multiple times: 'CUDA error: uncorrectable ECC error encountered'.
Everytime, after terminating the pod and starting a new one, the problem went away.
All incidents were on US-GA-2, H100-PCIe