const Comments - Answer Overflow

const

Posts Comments

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

11 replies

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

i'm not entirely certain this information is relevant to datacenter level networking issues

11 replies

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

advice #2 from runpod cs

Thank you for your patience as we work to get this issue resolved for you on your end. Currently, the pod and machine logs on our end are showing that they they are in good standing.
 
At this time, have you been able to connect to a new pod? We believe that maybe clearing your cache or attempting a new browser might benefit you to start a pod up successfully.

Thank you for your patience as we work to get this issue resolved for you on your end. Currently, the pod and machine logs on our end are showing that they they are in good standing.
 
At this time, have you been able to connect to a new pod? We believe that maybe clearing your cache or attempting a new browser might benefit you to start a pod up successfully.

11 replies

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

advice #1 from runpod cs

I'm sorry to hear that you're experiencing issues with your dual H100 pods on CA-MTL-1. It's indeed unusual that the issue persists with A40 pods but not with A5000 GPUs.
From what you've described, it seems like the issue might be related to the network volume in the CA-MTL-1 region. Network volumes are generally slower for read/write operations compared to direct volumes. However, the extent of the slowdown you're experiencing is not normal.
 
One possible solution is to copy the data from the network volume into the container volume and then read/write to the model from the container volume. This workaround has helped other customers with similar issues, and I believe it could be effective here as well. Also, if you happen to have any timestamps of when you noticed the slow down, that would be greatly appreciated as well.

I'm sorry to hear that you're experiencing issues with your dual H100 pods on CA-MTL-1. It's indeed unusual that the issue persists with A40 pods but not with A5000 GPUs.
From what you've described, it seems like the issue might be related to the network volume in the CA-MTL-1 region. Network volumes are generally slower for read/write operations compared to direct volumes. However, the extent of the slowdown you're experiencing is not normal.
 
One possible solution is to copy the data from the network volume into the container volume and then read/write to the model from the container volume. This workaround has helped other customers with similar issues, and I believe it could be effective here as well. Also, if you happen to have any timestamps of when you noticed the slow down, that would be greatly appreciated as well.

11 replies

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

worked over the weekend (with some sporadic freezes) to the point where I had 3 machines (4 H100 total), they all seem to be stuck

11 replies

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

i still am experiencing the issue

11 replies

RRunPod

•Created by const on 2/21/2025 in #⛅｜pods

H100 pod not connecting to network drive of the same region

Dang, I've prepaid for that machine for a week which is currently at an unusable state (since I can't get anything off of the network drive)

11 replies

Gaming

Programming