RunPod•10mo ago

RuntimeError: Found no NVIDIA driver on your system

I'm pulling my hair over this for 2 weeks now. I'm building a container with Comfyui and IDM-VTON custom_node and whenver I run it (serverless and on a pod) it gives me the "Found no driver" message. This container runs without a problem on my home 4090 and I'm using 4090 on runpod too. When I run the same container without IDM-VTON custom node, it runs fine. The container is the following: vazazon/comfyuivenv:dev What am I missing?

81 Replies

Marcus•10mo ago

What are you using as a base image?

Jason•10mo ago

Maybe you don't have the cuda libraries/driver in your image?

mikmak5595OP•10mo ago

My base image is madiator2011/better-comfyui:light which is part of what runpod offers. It has all the drivers required inside the container image. I'm running this container on my personal machine that has 4090 and it runs properly. Do you know if runpod passes --gpus=all to the running pod?

Jason•10mo ago

mikmak5595OP•10mo ago

I've also tried official base images offerred by runpod

Jason•10mo ago

yeah i think runpod does proper handling to pass the gpu access

mikmak5595OP•10mo ago

It's very strange that comfyui works properly up until I install the IDM-VTON custom nodes and from there on, comfyui refuses to load due to the driver issue

Jason•10mo ago

it gives me the "Found no driver" message. can you give a screenshot of this message where is it?

mikmak5595OP•10mo ago

message.txt

mikmak5595OP•10mo ago

This is the custom node I'm installing: https://github.com/TemryL/ComfyUI-IDM-VTON

GitHub

GitHub - TemryL/ComfyUI-IDM-VTON: ComfyUI adaptation of IDM-VTON fo...

ComfyUI adaptation of IDM-VTON for virtual try-on. - TemryL/ComfyUI-IDM-VTON

Jason•10mo ago

Hmm try setting 0 workers and back and report this endpoint to runpod from the contact button

mikmak5595OP•10mo ago

sorry, what do you mean "and back"?

Jason•10mo ago

Howmuch max workers you have now back to your amount currently before setting it to 0 ( it deletes all workers )

mikmak5595OP•10mo ago

ok, I'm building the EP again and will provide it to the support once it is up

Jason•10mo ago

What is EP?

mikmak5595OP•10mo ago

Endpoint I've spent almost 20$ on those experiments so far without success...

Jason•10mo ago

Yea, i have no idea of why is that hopefully if they checked problem is from runpod you can get some credits back

Marcus•10mo ago

Don't use CUDA 12.5.1 image

Jason•10mo ago

oh ya... they're not yet supported 🤣 the machines are like 12.4(max) right?

Marcus•10mo ago

Most are 12.1

Jason•10mo ago

yeah i thought they already released some on 12.5

Marcus•10mo ago

You have to set the filters for CUDA version on the endpoint if you want to use anything higher than 12.1 Thats why it can't find the GPU

Jason•10mo ago

hmm but it usually errors like cuda compability right? if this the case

mikmak5595OP•10mo ago

I've tried all kinds of images. All works well when loading Comfyui till I add the IDM-VTON custom node. The diffusers_load.py code accesses cuda and fails: File "/workspace/ComfyUI/nodes.py", line 21, in <module> 2024-07-30T07:35:28.884890696Z import comfy.diffusers_load 2024-07-30T07:35:28.884909706Z File "/workspace/ComfyUI/comfy/diffusers_load.py", line 3, in <module> 2024-07-30T07:35:28.885133662Z import comfy.sd 2024-07-30T07:35:28.885148463Z File "/workspace/ComfyUI/comfy/sd.py", line 5, in <module> 2024-07-30T07:35:28.885478892Z from comfy import model_management 2024-07-30T07:35:28.885491072Z File "/workspace/ComfyUI/comfy/model_management.py", line 119, in <module> 2024-07-30T07:35:28.885740719Z total_vram = get_total_memory(get_torch_device()) / (1024 * 1024) 2024-07-30T07:35:28.885750849Z File "/workspace/ComfyUI/comfy/model_management.py", line 88, in get_torch_device 2024-07-30T07:35:28.885848452Z return torch.device(torch.cuda.current_device()) 2024-07-30T07:35:28.885876893Z File "/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 778, in current_device 2024-07-30T07:35:28.886137860Z _lazy_init() 2024-07-30T07:35:28.886148180Z File "/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 293, in _lazy_init 2024-07-30T07:35:28.886281774Z torch._C._cuda_init() 2024-07-30T07:35:28.886360026Z RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Download the latest official NVIDIA drivers

mikmak5595OP•10mo ago

I've opened support case with all the information

Marcus•10mo ago

This is 100% because you are using wrong CUDA version base image Fix that Otherwise filter to CUDA 12.5 if its even available I see CUDA 12.5 is available in the dropdown Edit your endpoint, go to advanced and check CUDA 12.5 CUDA is not forwards compatible, you can't use a CUDA 12.5 docker image on a 12.1 host

Jason•10mo ago

ah yea

mikmak5595OP•10mo ago

I've tried defining cuda 12.5 and the issue still exists. Also, the log clearly states I'm running cuda 12.1 (not the host is 12.1)

mikmak5595OP•10mo ago

message.txt

mikmak5595OP•10mo ago

I've chosen 12.5 in the dropdown of the endpoint.

Jason•10mo ago

if you run nvidia-smi what happense in the command line

Marcus•10mo ago

This log does, your previous log said 12.5.1 Here Is the docker image code on Github? Looks like there may be an issue with torch installation

mikmak5595OP•10mo ago

The container is on dockerhub: vazazon/comfyuivenv:dev - the base image is madiator2011/better-comfyui:light and the only change I made that is related to the error is opening Comfyui GUI and add IDM-VTON custom node.

Jason•10mo ago

yeah could be your torch version is messed up

mikmak5595OP•10mo ago

How come the container works on my machine smoothly?

Marcus•10mo ago

How can it work on your machine? You don't have the serverless infrastructure on your machine

mikmak5595OP•10mo ago

runpod provides a way to run it locally, if there is a local file named test_input.json, it will be used as if you uploaded it through runpod api I run it using: docker run -it --rm --gpus=all -p3000:3000 comfyuivenv:dev

Gaming

Programming

RuntimeError: Found no NVIDIA driver on your system

Did you find this page helpful?