RunPod•13mo ago

Pod is unable to find/use GPU in python

Hi, I'm trying to connect to this pod: RunPod Pytorch 2.2.10 ID: zgel6p985mjmmn 1 x A30 8 vCPU 31 GB RAM runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 On-Demand - Community Cloud Running 40 GB Disk 20 GB Pod Volume Volume Path: /workspace I can see that it has a GPU with nvidia-smi, and the cuda and pytorch version seem correct, but I cannot use the GPU with torch... Can anyone help? Best ``` root@54be7382bee1:~# python Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch torch.cuda.is_available() /usr/local/lib/python3.10/dist-packages/torch/cuda/init.py:141: UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 False torch.version '2.2.0+cu121' exit() root@54be7382bee1:~# nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:16:06_PDT_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

Solution:

@Dhruv Mullick I don't think it has to do with the image... If you select it from the runpod website, there is a filter button at the top and then a drop down menu where you can select 12.2 as "Allowed CUDA Versions" as @ashleyk pointed out earlier 'the machine is running CUDA 12.3 which is not production ready'. if I select 12.2 it works....

Jump to solution

17 Replies

annah_doOP•13mo ago

ashleyk•13mo ago

Maybe because the machine is running CUDA 12.3 which is not production ready.

annah_doOP•13mo ago

most machines use CUDA 12.3 and with the 48GB GPU it works

ashleyk•13mo ago

@JM said they should all be on 12.2 because 12.3 is not production ready. I haven't seen any machines on 12.3 personally.

annah_doOP•13mo ago

hm just double checked and you are right. my 48GB GPU is actually on 12.2... will keep an eye open for thin in the future...

Dhruv Mullick•13mo ago

@ashleyk how do we use 12.2? I spawned an H100 SXM5 pod with the image: runpod/pytorch:2.1.1-py3.10-cuda12.1.1-devel-ubuntu22.04, but still nvidia-smi shows that cuda is 12.3 ID: axwx9s1edwts9x Facing the same issue as @annah_do This happens even if I change my template to: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04

Solution

annah_do•13mo ago

annah_doOP•13mo ago

Dhruv Mullick•13mo ago

Awesome, thank you @annah_do ! I thought it was the image that was controlling this.

Dhruv Mullick•13mo ago

Even with Cuda 12.2 I'm seeing the same error now

ashleyk•13mo ago

How did you install torch? Probably conda breaking stuff, conda sucks

Dhruv Mullick•13mo ago

I just used the torch from the latest torch + Cuda template ( I think it was runpod/pytorch :2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 but I've now deleted the pod)

ashleyk•13mo ago

RunPod templates don't use conda though as far as I'm aware. Your application probt installed it

Dhruv Mullick•13mo ago

This is clean VM, with no other commands executed but the ones shown above 😅

ashleyk•13mo ago

Thats not true, it does not say (torch_env) in front of my prompt like yours does with a clean pod.

ashleyk•13mo ago

That only happens when that crap conda gets installed. And it shows that CUDA is available on A100.

>>> torch.cuda.is_available()
True

>>> torch.cuda.is_available()
True

So I don't know what you are doing, but you are clearly doing something wrong.

JM•13mo ago

Hey guys! Yep, thanks @ashleyk Indeed, it might be possible that there would be some machines that slip off with 12.3, but the biggest bulk is on 12.2. Like already mentionned, 12.3 is beta and we recommend production ready drivers 🙂

Gaming

Programming

Pod is unable to find/use GPU in python

Did you find this page helpful?