R
RunPod5mo ago
AutoK

H100 multi-gpus settings

When I tried to load weights from checkpoints on my custom model using multi-gpus, weights are not loaded and the progress bar shows stop. I am using H100 x 7 on runpod, and when I did same trial on my local server (A6000 x 6), it worked well. Do you have any idea?
1 Reply
AutoK
AutoK5mo ago
Also, when I just tried to load weight only using one gpu (h100), it works well.