R
RunPod2mo ago
jax

CUDA env error

error log: 2024-05-27T14:08:55.663521063Z RuntimeError: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. 2024-05-27T14:08:55.902287850Z --- Starting Serverless Worker | Version 1.5.0 --- I'm using comfyui , so I'll start the comfyui service before runpod.serverless.start, and the problem occurs occasionally, here is my code echo "runpod-worker-comfy: Starting ComfyUI" python /ComfyUI/main.py --output-directory /ComfyUI/tmp/outputs --input-directory /tmp/inputs --disable-auto-launch --disable-metadata & echo "runpod-worker-comfy: Starting RunPod Handler" python -u /ComfyUI/rp_handler.py rp_handler.py inneholder runpod.serverless.start-koden, men det ser ut som om den starter comfyui for tidlig
6 Replies
digigoblin
digigoblin2mo ago
Some of your message is in German or something. Which CUDA version does your base image use? You may have to set the CUDA filter on your endpoint.
jax
jax2mo ago
@digigoblin Ok, I'm using 11.8 for my IMAGE, I'll try setting it up, if not what does it default to?
digigoblin
digigoblin2mo ago
That should be fine then, should only be an issue if your image uses 12.2 for example but you get a machine with 12.0 or 12.1 I suggest getting the worker id of the worker that had the above error and logging a support ticket for it. If you don't set CUDA version, it uses all available versions.
jax
jax2mo ago
or5ffzvunscrpp This is the work_id with the error , it appears by chance
digigoblin
digigoblin2mo ago
I would terminate that worker and log a support ticket for RunPod to investigate whats wrong with that worker.
jax
jax2mo ago
Ok Thanks and again point out that runpod's customer service is really good and because of that I've promoted it to a lot of people.