bwa
bwa
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Same like before: 'CUDA error: uncorrectable ECC error encountered' when I ran kohya scripts. The container itself was launched fine.
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Probably the same machine. It seems to start a bit slower than working ones.
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Yep.
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Also: hq57ofbzb1xmhb, cz6iu4pzb8z8h4
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
@yhlong00000 Hey there, just got two more faulty instances: 6kxad780u6bda9, oweexcwlv8y62k. Same error. H100 NVL
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
BTW, this happens when I run kohya-scripts. But the exact same script and config works with non-faulty nodes.
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Another one: g91ov3ym70j0rc
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Can I get refund for the time I wasted on these? Had like more than 10 of these in the past 2 days.
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Another one. Two in a row: 3g93y1byjkjq1o
18 replies
RRunPod
Created by bwa on 12/4/2024 in #⛅|pods
Faulty node?
Didn't note them down but just encountered one again: 3sc3qsn1qhu0mz
18 replies