R
RunPod•4mo ago
mikmak5595

RuntimeError: Found no NVIDIA driver on your system

I'm pulling my hair over this for 2 weeks now. I'm building a container with Comfyui and IDM-VTON custom_node and whenver I run it (serverless and on a pod) it gives me the "Found no driver" message. This container runs without a problem on my home 4090 and I'm using 4090 on runpod too. When I run the same container without IDM-VTON custom node, it runs fine. The container is the following: vazazon/comfyuivenv:dev What am I missing?
81 Replies
Marcus
Marcus•4mo ago
What are you using as a base image?
nerdylive
nerdylive•4mo ago
Maybe you don't have the cuda libraries/driver in your image?
mikmak5595
mikmak5595OP•4mo ago
My base image is madiator2011/better-comfyui:light which is part of what runpod offers. It has all the drivers required inside the container image. I'm running this container on my personal machine that has 4090 and it runs properly. Do you know if runpod passes --gpus=all to the running pod?
nerdylive
nerdylive•4mo ago
ic
mikmak5595
mikmak5595OP•4mo ago
I've also tried official base images offerred by runpod
nerdylive
nerdylive•4mo ago
yeah i think runpod does proper handling to pass the gpu access
mikmak5595
mikmak5595OP•4mo ago
It's very strange that comfyui works properly up until I install the IDM-VTON custom nodes and from there on, comfyui refuses to load due to the driver issue
nerdylive
nerdylive•4mo ago
it gives me the "Found no driver" message. can you give a screenshot of this message where is it?
mikmak5595
mikmak5595OP•4mo ago
mikmak5595
mikmak5595OP•4mo ago
This is the custom node I'm installing: https://github.com/TemryL/ComfyUI-IDM-VTON
GitHub
GitHub - TemryL/ComfyUI-IDM-VTON: ComfyUI adaptation of IDM-VTON fo...
ComfyUI adaptation of IDM-VTON for virtual try-on. - TemryL/ComfyUI-IDM-VTON
nerdylive
nerdylive•4mo ago
Hmm try setting 0 workers and back and report this endpoint to runpod from the contact button
mikmak5595
mikmak5595OP•4mo ago
sorry, what do you mean "and back"?
nerdylive
nerdylive•4mo ago
Howmuch max workers you have now back to your amount currently before setting it to 0 ( it deletes all workers )
mikmak5595
mikmak5595OP•4mo ago
ok, I'm building the EP again and will provide it to the support once it is up
nerdylive
nerdylive•4mo ago
What is EP?
mikmak5595
mikmak5595OP•4mo ago
Endpoint I've spent almost 20$ on those experiments so far without success...
nerdylive
nerdylive•4mo ago
Yea, i have no idea of why is that hopefully if they checked problem is from runpod you can get some credits back
Marcus
Marcus•4mo ago
Don't use CUDA 12.5.1 image
nerdylive
nerdylive•4mo ago
oh ya... they're not yet supported 🤣 the machines are like 12.4(max) right?
Marcus
Marcus•4mo ago
Most are 12.1
nerdylive
nerdylive•4mo ago
yeah i thought they already released some on 12.5
Marcus
Marcus•4mo ago
You have to set the filters for CUDA version on the endpoint if you want to use anything higher than 12.1 Thats why it can't find the GPU
nerdylive
nerdylive•4mo ago
hmm but it usually errors like cuda compability right? if this the case
mikmak5595
mikmak5595OP•4mo ago
I've tried all kinds of images. All works well when loading Comfyui till I add the IDM-VTON custom node. The diffusers_load.py code accesses cuda and fails: File "/workspace/ComfyUI/nodes.py", line 21, in <module> 2024-07-30T07:35:28.884890696Z import comfy.diffusers_load 2024-07-30T07:35:28.884909706Z File "/workspace/ComfyUI/comfy/diffusers_load.py", line 3, in <module> 2024-07-30T07:35:28.885133662Z import comfy.sd 2024-07-30T07:35:28.885148463Z File "/workspace/ComfyUI/comfy/sd.py", line 5, in <module> 2024-07-30T07:35:28.885478892Z from comfy import model_management 2024-07-30T07:35:28.885491072Z File "/workspace/ComfyUI/comfy/model_management.py", line 119, in <module> 2024-07-30T07:35:28.885740719Z total_vram = get_total_memory(get_torch_device()) / (1024 * 1024) 2024-07-30T07:35:28.885750849Z File "/workspace/ComfyUI/comfy/model_management.py", line 88, in get_torch_device 2024-07-30T07:35:28.885848452Z return torch.device(torch.cuda.current_device()) 2024-07-30T07:35:28.885876893Z File "/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 778, in current_device 2024-07-30T07:35:28.886137860Z _lazy_init() 2024-07-30T07:35:28.886148180Z File "/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 293, in _lazy_init 2024-07-30T07:35:28.886281774Z torch._C._cuda_init() 2024-07-30T07:35:28.886360026Z RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
Download the latest official NVIDIA drivers
Download the latest official NVIDIA drivers
mikmak5595
mikmak5595OP•4mo ago
I've opened support case with all the information
Marcus
Marcus•4mo ago
This is 100% because you are using wrong CUDA version base image Fix that Otherwise filter to CUDA 12.5 if its even available I see CUDA 12.5 is available in the dropdown Edit your endpoint, go to advanced and check CUDA 12.5 CUDA is not forwards compatible, you can't use a CUDA 12.5 docker image on a 12.1 host
nerdylive
nerdylive•4mo ago
ah yea
mikmak5595
mikmak5595OP•4mo ago
I've tried defining cuda 12.5 and the issue still exists. Also, the log clearly states I'm running cuda 12.1 (not the host is 12.1)
mikmak5595
mikmak5595OP•4mo ago
mikmak5595
mikmak5595OP•4mo ago
I've chosen 12.5 in the dropdown of the endpoint.
nerdylive
nerdylive•4mo ago
if you run nvidia-smi what happense in the command line
Marcus
Marcus•4mo ago
This log does, your previous log said 12.5.1 Here Is the docker image code on Github? Looks like there may be an issue with torch installation
mikmak5595
mikmak5595OP•4mo ago
The container is on dockerhub: vazazon/comfyuivenv:dev - the base image is madiator2011/better-comfyui:light and the only change I made that is related to the error is opening Comfyui GUI and add IDM-VTON custom node.
nerdylive
nerdylive•4mo ago
yeah could be your torch version is messed up
mikmak5595
mikmak5595OP•4mo ago
How come the container works on my machine smoothly?
Marcus
Marcus•4mo ago
How can it work on your machine? You don't have the serverless infrastructure on your machine
mikmak5595
mikmak5595OP•4mo ago
runpod provides a way to run it locally, if there is a local file named test_input.json, it will be used as if you uploaded it through runpod api I run it using: docker run -it --rm --gpus=all -p3000:3000 comfyuivenv:dev
Madiator2011
Madiator2011•4mo ago
madiator2011/better-comfyui:light is based on CUDA 12.1
mikmak5595
mikmak5595OP•4mo ago
And this is why I don't understand why it fails to run on runpod
Madiator2011
Madiator2011•4mo ago
You know my image is not made for serverless 🙂
mikmak5595
mikmak5595OP•4mo ago
I also tried during the last 2 weeks to use runpod's suggested images with the same results
Madiator2011
Madiator2011•4mo ago
Better Comfy UI is made for Pods not serverless
mikmak5595
mikmak5595OP•4mo ago
I tried to run it on pod too - same results this is why I said I'm pulling my hair off due to this issue
Madiator2011
Madiator2011•4mo ago
just testing it out ComfyUI-IDM-VTON ? @mikmak5595 have workflow? kinda need more info if I want to test so I installed without issues
Madiator2011
Madiator2011•4mo ago
No description
Madiator2011
Madiator2011•4mo ago
@mikmak5595 I see TripleDES might be causing issues For now you need to install this: cryptography<43.0.0
Marcus
Marcus•4mo ago
Looks like its just a warning about deprecation though, not an actual error
Madiator2011
Madiator2011•4mo ago
I know it caused issue when I worked on whispherx worker
Marcus
Marcus•4mo ago
@Madiator2011 are you an AI? You are here 24/7
nerdylive
nerdylive•4mo ago
No he is a person
Madiator2011
Madiator2011•4mo ago
hmm
Marcus
Marcus•4mo ago
It was a joke
Madiator2011
Madiator2011•4mo ago
No errors on my side TemryL/ComfyUI-IDM-VTON installed fine without issues
mikmak5595
mikmak5595OP•4mo ago
This is the workflow
mikmak5595
mikmak5595OP•4mo ago
If you run my container without any command it should process this workflow automatically The problem is not with installing IDM-VTON, it's with providing the container with IDM-VTON installed afterwards. My container already has this module installed along with all required models. When I run it without any CMD, it fails during Comfyui bringup as it calls the customer nodes' init function which tries to use CUDA
Madiator2011
Madiator2011•4mo ago
@mikmak5595 want me to help you with setup?
mikmak5595
mikmak5595OP•4mo ago
yes please , I have the docker image vazazon/comfyuivenv :dev that when executed both in pod and serverless mode gives "RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx" error. The same container runs fine on my machine using: docker run -it --rm --gpus=all vazazon/comfyuivenv:dev
Madiator2011
Madiator2011•4mo ago
what is your api input? also noticed you have 2 venvs in container @mikmak5595 would it be issue to share dockerfile and code of worker you use?
mikmak5595
mikmak5595OP•4mo ago
This is the workflow. I don't have dockerfile as I took the base image of runpod and installed all custom_nodes on it manually and then ran docker commit
Madiator2011
Madiator2011•4mo ago
cause issue is with image itself it's most likely venv mess up I'm not sure why you have taken such a hard way of doing it
mikmak5595
mikmak5595OP•4mo ago
I also tried the none venv way. One I installed the IDM-VTON custom_node things were starting to mess up
Madiator2011
Madiator2011•4mo ago
on serverless you want to avoid venvs you do not plan to use network storage?
mikmak5595
mikmak5595OP•4mo ago
no, I placed all models and other data on the container image btw, it fails to run both on pod and on serverless
Madiator2011
Madiator2011•4mo ago
ye cause you totally messed up docker image it wasnt build corectly so the nvidia drivers did not get injected
mikmak5595
mikmak5595OP•4mo ago
Can you explain what you mean by "it wasn't build correctly"?
Madiator2011
Madiator2011•4mo ago
build with dockerfile and then build command
mikmak5595
mikmak5595OP•4mo ago
I did build the base image using docker build and then opened bash inside and ran pip install on all custom_nodes' requirements.txt files
Madiator2011
Madiator2011•4mo ago
it's late for me but tomorrow can try make you template
mikmak5595
mikmak5595OP•4mo ago
that would be awesome!
mikmak5595
mikmak5595OP•4mo ago
Here is dockerfile and additional minor changes to https://github.com/blib-la/runpod-worker-comfy code. To build it, use the following command: docker build -t vazazon/runpod-worker-comfy:dev-full2 --target base --platform linux/amd64 .
GitHub
GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API o...
ComfyUI as a serverless API on RunPod. Contribute to blib-la/runpod-worker-comfy development by creating an account on GitHub.
mikmak5595
mikmak5595OP•4mo ago
I'm uploading the output docker image to dockerhub and will try this image both as pod and in serverless mode in runpod and update here
Marcus
Marcus•4mo ago
Its better to fork the Repo and push your changes to GitHub than to post zip files of changes.
mikmak5595
mikmak5595OP•4mo ago
Let's first wait to see if the image runs properly on runpod. Update: the image I created from the dockerfile above works! The next challenge with serverless mode is when running the 2nd workflow causing out of memory on the GPU. I will have to check how I can cause comfyui to free up GPU memory at the beginning/end of the workflow
Madiator2011
Madiator2011•4mo ago
Nice
Encyrption
Encyrption•4mo ago
Sounds like you need to pick a GPU with more VRAM.
mikmak5595
mikmak5595OP•4mo ago
It happens only on the 2nd time of invoking the API. It means that comfyui does not free up GPU vram after running the workflow. Any suggestions how to achieve that? maybe runpod API provides a way to do that?
Encyrption
Encyrption•4mo ago
Reddit
From the comfyui community on Reddit
Explore this post and more from the comfyui community
Madiator2011
Madiator2011•4mo ago
why not keeping model in memory unless you switching models
Encyrption
Encyrption•4mo ago
It's the only way I know of to free up VRAM.
mikmak5595
mikmak5595OP•4mo ago
I was able to resolve the issue by adding " --lowvram" to the comfyui command line
Madiator2011 (Work)
Madiator2011 (Work)•4mo ago
@mikmak5595 just checking up on you. So the issue is solved?
Want results from more Discord servers?
Add your server