H100 pod not connecting to network drive of the same region
advice #2 from runpod cs
Thank you for your patience as we work to get this issue resolved for you on your end. Currently, the pod and machine logs on our end are showing that they they are in good standing.
At this time, have you been able to connect to a new pod? We believe that maybe clearing your cache or attempting a new browser might benefit you to start a pod up successfully.
Thank you for your patience as we work to get this issue resolved for you on your end. Currently, the pod and machine logs on our end are showing that they they are in good standing.
At this time, have you been able to connect to a new pod? We believe that maybe clearing your cache or attempting a new browser might benefit you to start a pod up successfully.
H100 pod not connecting to network drive of the same region
advice #1 from runpod cs
I'm sorry to hear that you're experiencing issues with your dual H100 pods on CA-MTL-1. It's indeed unusual that the issue persists with A40 pods but not with A5000 GPUs.
From what you've described, it seems like the issue might be related to the network volume in the CA-MTL-1 region. Network volumes are generally slower for read/write operations compared to direct volumes. However, the extent of the slowdown you're experiencing is not normal.
One possible solution is to copy the data from the network volume into the container volume and then read/write to the model from the container volume. This workaround has helped other customers with similar issues, and I believe it could be effective here as well. Also, if you happen to have any timestamps of when you noticed the slow down, that would be greatly appreciated as well.
I'm sorry to hear that you're experiencing issues with your dual H100 pods on CA-MTL-1. It's indeed unusual that the issue persists with A40 pods but not with A5000 GPUs.
From what you've described, it seems like the issue might be related to the network volume in the CA-MTL-1 region. Network volumes are generally slower for read/write operations compared to direct volumes. However, the extent of the slowdown you're experiencing is not normal.
One possible solution is to copy the data from the network volume into the container volume and then read/write to the model from the container volume. This workaround has helped other customers with similar issues, and I believe it could be effective here as well. Also, if you happen to have any timestamps of when you noticed the slow down, that would be greatly appreciated as well.