R
RunPod2mo ago
Will

Networking Multiple Pods Together

I'm looking to train a distributed model on runpod. When configuring the torch.distributed or jax.distributed you provide a coordinator_address of the form ip:port. Right now I'm unable to confirm that two pods can communicate with one another. I start one pod expose a 70000 level port, ssh into it, run ip route to get the local IP, then start a simple python server python -m http.server 70000. Then SSH into the other pod and run curl <pod_1_local_ip>:<pod_1_70000_port>. This consitently fails. My intuition is that the docker containers don't belong to the same network, to my knowledge we users don't have the privilege to setup such a network on the datacenters machine, only modify containers on a one off basis. Any guidance on enabling communication between pods would be greatly appricieated!
6 Replies
Will
Will2mo ago
Would be most helpful if someone could inform on how to find the host IP address. Given we only have access to the containers there doesn't seem to be any way to access the host IP
nerdylive
nerdylive2mo ago
You can request pods to have public ip in community cloud It'll be there in the connect button if you expose some tcp
Will
Will2mo ago
I'm looking for a local ip, the ip of the host machine that the container is running inside of
nerdylive
nerdylive2mo ago
Oh networking between pods, I think containers aren't connected together in a private net even though they are in the same secure cloud dc ( not sure ) But it's best to open a support ticket to ask this pods private network thing
digigoblin
digigoblin2mo ago
Yeah private networking between pods is not supported.
flash-singh
flash-singh5w ago
@Will which GPUs are you using?