kh
kh
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
i thought I set MASTER_ADDR and MASTER_PORT before launching it, and was using os.environ to access them. but i wasn't so torch.dist init was not to the right hostname and port. sorry!
11 replies
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
ah ok i figured it out. it was an issue in my script
11 replies
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
ping {}.runpod.internal 29400 on Pod B for Pod A is fine
11 replies
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
i run nc -vv {}.runpod.internal 29400 on Pod B, with the global networking hostname for Pod A and it says " port 29400 (tcp) failed: Connection refused"
11 replies
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
no worries. it seems the Pods cannot communicate between each other on port 29400
11 replies
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
what could be a solution? do i need to set up SSH between the pods?
11 replies
RRunPod
Created by kh on 3/29/2025 in #⛅|pods
Global Networking
11 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
hope this helps!
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
to check, you can look at your pod instance on the UI https://www.runpod.io/console/pods and if it has the field "volume:" and your network storage name, it means its connected.
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
NOTE: if you do not deploy a pod that is connected to your network storage, there will still be a workspace directory. but it will not be persistent.
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
c. once you launch. follow your normal way to access you pod. once in you can find your network storage under the workspace directory (i.e. cd workspace).
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
b. once that's done. your network storage will appear in https://www.runpod.io/console/user/storage. click deploy and and it will bring you to launch a pod that will have access to your network storage
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
a. pick the location which has the gpus you intend to use. this is important, because your network storage has to be sited in the same location as the gpu you will want to use.
9 replies
RRunPod
Created by dokoissho on 3/17/2025 in #⛅|pods
I lose my data every time I stop my pod
1. you need to create a network storage volume. https://www.runpod.io/console/user/storage
9 replies
RRunPod
Created by kh on 3/13/2025 in #⛅|pods
how to connect to Network Volume after ssh-ing into a Pod?
fair point. it's mentioned in passing under troubleshooting. but i had to know to search "workspace" in the first place in order to find it.
11 replies
RRunPod
Created by kh on 3/13/2025 in #⛅|pods
how to connect to Network Volume after ssh-ing into a Pod?
yes i do. actually sorry my bad. it was a wrongly set path in script that was saving to the pod disk space instead of to workspace
11 replies
RRunPod
Created by kh on 3/13/2025 in #⛅|pods
how to connect to Network Volume after ssh-ing into a Pod?
starting to get a little frustrating
11 replies
RRunPod
Created by kh on 3/13/2025 in #⛅|pods
how to connect to Network Volume after ssh-ing into a Pod?
it doesn't seem to be it. workspace is is persistent. but i max out at 20gb. even though my network volume is 75gb
11 replies
RRunPod
Created by kh on 3/13/2025 in #⛅|pods
how to connect to Network Volume after ssh-ing into a Pod?
ok figured it out. it's under workspace. might be useful to add this info to the documentation!
11 replies