H100 multi-gpus settings
When I tried to load weights from checkpoints on my custom model using multi-gpus, weights are not loaded and the progress bar shows stop.
I am using H100 x 7 on runpod, and when I did same trial on my local server (A6000 x 6), it worked well.
Do you have any idea?
1 Reply
Also, when I just tried to load weight only using one gpu (h100), it works well.