Multi Node training with torchrun/slurm
Has anyone here ever tried multinode on runpod? I am thinking of setting this up but if people have encountered prohibitive network speeds I do not see a reason to.
6 Replies
you won't be able to do this without our multi-node feature since you don't get access to internal ips
multi-node??
its a new service we are currently alpha testing, will let you deploy multi node clusters for training or other use cases with 100+ Gbps private networking
h100s gonna run out soon
a100s / h100s, likely will open beta next month
nice
i don't think ill use that hasn't explored that much to clusters yet hahah
seems like a great feature