H100 pod not connecting to network drive of the same region
I have a dual H100 pod that's supposed to be connected to a network drive (both on CA-MTL-1), but when I try to move data, do a git status of a repo, or even start a python script residing on the network drive the terminal hangs. Seems like a network issue? I've trying to spawn dual H100 pods multiple times, but I'm getting the same IP (probably the same hardware?), so nothing changes. Trying this out from a machine with RTX A5000 works fine!
Is there something I can do?
7 Replies
Same problem on CA region A40 pods
Dang, I've prepaid for that machine for a week which is currently at an unusable state (since I can't get anything off of the network drive)
@const @riverfog7 Are you still seeing this issue? This should've been resolved in the time this thread has been open.
i still am experiencing the issue
It works now
worked over the weekend (with some sporadic freezes) to the point where I had 3 machines (4 H100 total), they all seem to be stuck
advice #1 from runpod cs
advice #2 from runpod cs
i'm not entirely certain this information is relevant to datacenter level networking issues
even trying to directly scp a file, download stalls
