R
RunPod7d ago
kh

Global Networking

I am trying to use Global Networking. i have 1 master and 2 worker GPUs, all on different pods, but in the same data centre. it seems that the ports are not open between the pods and only port 22 is. I tried to specify a specific TCP port to expose when starting up the Pods too, but it does not work. I need to allow communications between the Pods for torch.dist
4 Replies
kh
khOP7d ago
RunPod Blog
Announcing Global Networking For Cross-Data Center Communication
RunPod is pleased to announce its launch of our Global Networking feature, which allows for cross-data center communication between pods. When a pod with the feature is deployed, your pods can communicate with each other over a virtual internal network facilitated by RunPod. This means that you can have pods
kh
khOP7d ago
what could be a solution? do i need to set up SSH between the pods?
Dj
Dj7d ago
You can't use your .runpod.internal subdomains at all? Just trying to understand the issue, sorry
kh
khOP7d ago
no worries. it seems the Pods cannot communicate between each other on port 29400 i run nc -vv {}.runpod.internal 29400 on Pod B, with the global networking hostname for Pod A and it says " port 29400 (tcp) failed: Connection refused" ping {}.runpod.internal 29400 on Pod B for Pod A is fine ah ok i figured it out. it was an issue in my script i thought I set MASTER_ADDR and MASTER_PORT before launching it, and was using os.environ to access them. but i wasn't so torch.dist init was not to the right hostname and port. sorry!

Did you find this page helpful?