R
RunPod10mo ago
annah_do

Pod is stuck in a loop and does not finish creating

Hi, I'm trying to start a 1 x V100 SXM2 32GB with additional disk space (40 GB). It worked fine until yesterday. now when I'm trying to create it gets stuck in this loop:
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
It did work with a larget GPU yesterday... Can anyone help me? thx
9 Replies
ashleyk
ashleyk10mo ago
I assume this is BG region in Community Cloud? I am having the same issues with A5000's in BG. @JM can someone contact/unlist this host please? Its wasting our money when the internet is broken and can't even pull the Docker image. Worst of all is I leave it to pull the docker image and then it goes into an infinite loop and wastes my credits I think we shouldn't be charged for docker image pulls for pods and only for the time the container is actually running like with serverless.
annah_do
annah_doOP10mo ago
it's in the previous generation of community cloud
ashleyk
ashleyk10mo ago
Whats previous generation of community cloud? There isn't any such thing as previous generation of community cloud.
annah_do
annah_doOP10mo ago
I'm not sure how to express it.. i select community cloud and then I see this. in the lower part it says previous generation.
No description
ashleyk
ashleyk10mo ago
Oh thats a heading for GPU type, which specific GPU type are you using?
annah_do
annah_doOP10mo ago
I was using 1 x V100 SXM2 32GB. if that's not the GPU type, then how would i find it?
ashleyk
ashleyk10mo ago
Yes thats it, but in which region? I think its BG region, they offer that GPU type.
annah_do
annah_doOP10mo ago
what do you mean by region? and what does BG stand for? sry, im new to this...
JM
JM9mo ago
Good intel both. Thanks! We are actually working on a big quality control initiative. It includes: - Putting a very hard enforcement on minimum specs, while decommissioning machines that do not meet those. Even if those have been onboarding a long time ago. - Much more strict and automated verification. - Automatic benchmarking of all servers, and multi-gpu usage testing. All servers, no exception. Any low performing ones that are out of the ordinary are going to get removed or upgraded.
Want results from more Discord servers?
Add your server