annah_do
annah_do
RRunPod
Created by annah_do on 2/23/2024 in #⛅|pods
Pod is unable to find/use GPU in python
Hi, I'm trying to connect to this pod: RunPod Pytorch 2.2.10 ID: zgel6p985mjmmn 1 x A30 8 vCPU 31 GB RAM runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 On-Demand - Community Cloud Running 40 GB Disk 20 GB Pod Volume Volume Path: /workspace I can see that it has a GPU with nvidia-smi, and the cuda and pytorch version seem correct, but I cannot use the GPU with torch... Can anyone help? Best ``` root@54be7382bee1:~# python Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
import torch torch.cuda.is_available() /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False torch.version '2.2.0+cu121' exit() root@54be7382bee1:~# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0
27 replies
RRunPod
Created by annah_do on 2/23/2024 in #⛅|pods
Pod is stuck in a loop and does not finish creating
Hi, I'm trying to start a 1 x V100 SXM2 32GB with additional disk space (40 GB). It worked fine until yesterday. now when I'm trying to create it gets stuck in this loop:
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2024-02-23T11:34:34Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:42Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:47Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp 54.227.20.253:443: i/o timeout
2024-02-23T11:34:47Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:55Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z create container runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:34:58Z pending image pull runpod/pytorch:2.1.0-py3.10-cuda11.8.0-devel-ubuntu22.04
2024-02-23T11:35:02Z error pulling image: Error response from daemon: Get "https://registry-1.docker.io/v2/": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
It did work with a larget GPU yesterday... Can anyone help me? thx
15 replies