const
const
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
No description
11 replies
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
i'm not entirely certain this information is relevant to datacenter level networking issues
11 replies
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
advice #2 from runpod cs
Thank you for your patience as we work to get this issue resolved for you on your end. Currently, the pod and machine logs on our end are showing that they they are in good standing.

At this time, have you been able to connect to a new pod? We believe that maybe clearing your cache or attempting a new browser might benefit you to start a pod up successfully.
Thank you for your patience as we work to get this issue resolved for you on your end. Currently, the pod and machine logs on our end are showing that they they are in good standing.

At this time, have you been able to connect to a new pod? We believe that maybe clearing your cache or attempting a new browser might benefit you to start a pod up successfully.
11 replies
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
advice #1 from runpod cs
I'm sorry to hear that you're experiencing issues with your dual H100 pods on CA-MTL-1. It's indeed unusual that the issue persists with A40 pods but not with A5000 GPUs.
From what you've described, it seems like the issue might be related to the network volume in the CA-MTL-1 region. Network volumes are generally slower for read/write operations compared to direct volumes. However, the extent of the slowdown you're experiencing is not normal.

One possible solution is to copy the data from the network volume into the container volume and then read/write to the model from the container volume. This workaround has helped other customers with similar issues, and I believe it could be effective here as well. Also, if you happen to have any timestamps of when you noticed the slow down, that would be greatly appreciated as well.
I'm sorry to hear that you're experiencing issues with your dual H100 pods on CA-MTL-1. It's indeed unusual that the issue persists with A40 pods but not with A5000 GPUs.
From what you've described, it seems like the issue might be related to the network volume in the CA-MTL-1 region. Network volumes are generally slower for read/write operations compared to direct volumes. However, the extent of the slowdown you're experiencing is not normal.

One possible solution is to copy the data from the network volume into the container volume and then read/write to the model from the container volume. This workaround has helped other customers with similar issues, and I believe it could be effective here as well. Also, if you happen to have any timestamps of when you noticed the slow down, that would be greatly appreciated as well.
11 replies
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
worked over the weekend (with some sporadic freezes) to the point where I had 3 machines (4 H100 total), they all seem to be stuck
11 replies
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
i still am experiencing the issue
11 replies
RRunPod
Created by const on 2/21/2025 in #⛅|pods
H100 pod not connecting to network drive of the same region
Dang, I've prepaid for that machine for a week which is currently at an unusable state (since I can't get anything off of the network drive)
11 replies