RunPod13mo ago

Pod is stuck in a loop and does not finish creating

Hi, I'm trying to start a 1 x V100 SXM2 32GB with additional disk space (40 GB). It worked fine until yesterday. now when I'm trying to create it gets stuck in this loop:
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
It did work with a larget GPU yesterday... Can anyone help me? thx
9 Replies
ashleyk13mo ago
I assume this is BG region in Community Cloud? I am having the same issues with A5000's in BG. @JM can someone contact/unlist this host please? Its wasting our money when the internet is broken and can't even pull the Docker image. Worst of all is I leave it to pull the docker image and then it goes into an infinite loop and wastes my credits I think we shouldn't be charged for docker image pulls for pods and only for the time the container is actually running like with serverless.
annah_doOP13mo ago
it's in the previous generation of community cloud
ashleyk13mo ago
Whats previous generation of community cloud? There isn't any such thing as previous generation of community cloud.
annah_doOP13mo ago
I'm not sure how to express it.. i select community cloud and then I see this. in the lower part it says previous generation.
No description
ashleyk13mo ago
Oh thats a heading for GPU type, which specific GPU type are you using?
annah_doOP13mo ago
I was using 1 x V100 SXM2 32GB. if that's not the GPU type, then how would i find it?
ashleyk13mo ago
Yes thats it, but in which region? I think its BG region, they offer that GPU type.
annah_doOP13mo ago
what do you mean by region? and what does BG stand for? sry, im new to this...
JM13mo ago
Good intel both. Thanks! We are actually working on a big quality control initiative. It includes: - Putting a very hard enforcement on minimum specs, while decommissioning machines that do not meet those. Even if those have been onboarding a long time ago. - Much more strict and automated verification. - Automatic benchmarking of all servers, and multi-gpu usage testing. All servers, no exception. Any low performing ones that are out of the ordinary are going to get removed or upgraded.

Did you find this page helpful?