RunPod•13mo ago

Pod is stuck in a loop and does not finish creating

Hi, I'm trying to start a 1 x V100 SXM2 32GB with additional disk space (40 GB). It worked fine until yesterday. now when I'm trying to create it gets stuck in this loop:

2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

It did work with a larget GPU yesterday... Can anyone help me? thx

9 Replies

ashleyk•13mo ago

I assume this is BG region in Community Cloud? I am having the same issues with A5000's in BG. @JM can someone contact/unlist this host please? Its wasting our money when the internet is broken and can't even pull the Docker image. Worst of all is I leave it to pull the docker image and then it goes into an infinite loop and wastes my credits I think we shouldn't be charged for docker image pulls for pods and only for the time the container is actually running like with serverless.

annah_doOP•13mo ago

it's in the previous generation of community cloud

ashleyk•13mo ago

Whats previous generation of community cloud? There isn't any such thing as previous generation of community cloud.

annah_doOP•13mo ago

I'm not sure how to express it.. i select community cloud and then I see this. in the lower part it says previous generation.

ashleyk•13mo ago

Oh thats a heading for GPU type, which specific GPU type are you using?

annah_doOP•13mo ago

I was using 1 x V100 SXM2 32GB. if that's not the GPU type, then how would i find it?

ashleyk•13mo ago

Yes thats it, but in which region? I think its BG region, they offer that GPU type.

annah_doOP•13mo ago

what do you mean by region? and what does BG stand for? sry, im new to this...

JM•13mo ago

Good intel both. Thanks! We are actually working on a big quality control initiative. It includes: - Putting a very hard enforcement on minimum specs, while decommissioning machines that do not meet those. Even if those have been onboarding a long time ago. - Much more strict and automated verification. - Automatic benchmarking of all servers, and multi-gpu usage testing. All servers, no exception. Any low performing ones that are out of the ordinary are going to get removed or upgraded.

Gaming

Programming

Pod is stuck in a loop and does not finish creating

Did you find this page helpful?