R
RunPod4mo ago
fireice

Why "CUDA out of memory" Today ? Same image to generate portrait, yesterday is ok , today in not.

"delayTime": 133684, "error": "CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 23.68 GiB total capacity; 18.84 GiB already allocated; 1.47 GiB free; 20.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF", "executionTime": 45263, "id": "ae1e4066-e2b7-43c1-8f37-3525bda03893-e1",
32 Replies
Marcus
Marcus4mo ago
Ask the developer of the application, it has nothing to do with RunPod.
nerdylive
nerdylive4mo ago
seems like out of memory error meaning you need a bigger gpu for that or try unloading your other models somehow
fireice
fireiceOP4mo ago
I am the developer. When I use my ai app, I get CUDA out of memory. I did nothing to the app.
Marcus
Marcus4mo ago
Then it needs a larger GPU as nerdy said.
Encyrption
Encyrption4mo ago
It looks like you are trying to use a 24GB GPU when you need more VRAM. Try to run it on a 48GB GPU. If that is still not enough then try to run it on a 80GB GPU.
fireice
fireiceOP4mo ago
OK, I see, I will test.
echoSplice
echoSplice4mo ago
I have exactly the same problem. We have changed nothing in our setup. Just today most image generation fails
No description
echoSplice
echoSplice4mo ago
I have a second serverless endpoint running that uses the same template. that one is running fine
nerdylive
nerdylive4mo ago
how does your setup work? does it unloads model? what models are loaded in the vram, maybe too much model is loaded
echoSplice
echoSplice4mo ago
I have just realised this only happend on one specific worker: m07jdb658oetph thats why not all of the generations failed and my other endpoint runs fine
nerdylive
nerdylive4mo ago
Interesting, when it happens, try collect traceback or logs
echoSplice
echoSplice4mo ago
I have not switched it back on. But I can give you the logs from the weekend when it happened
nerdylive
nerdylive4mo ago
Any stacktrace? maybe somewhere here:
2024-08-03 20:33:07.360 | info | m07jdb658oetph | ', memory monitor disabled

2024-08-03 20:33:07.360 | info | m07jdb658oetph | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-08-03 20:33:07.360 | info | m07jdb658oetph | For debugging consider passing CUDA_LAUNCH_BLOCKING=1

2024-08-03 20:33:07.360 | info | m07jdb658oetph | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

2024-08-03 20:33:07.360 | info | m07jdb658oetph | Warning: caught exception 'CUDA error: out of memory
2024-08-03 20:33:07.360 | info | m07jdb658oetph | ', memory monitor disabled

2024-08-03 20:33:07.360 | info | m07jdb658oetph | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-08-03 20:33:07.360 | info | m07jdb658oetph | For debugging consider passing CUDA_LAUNCH_BLOCKING=1

2024-08-03 20:33:07.360 | info | m07jdb658oetph | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

2024-08-03 20:33:07.360 | info | m07jdb658oetph | Warning: caught exception 'CUDA error: out of memory
echoSplice
echoSplice4mo ago
sorry, not sure how I would get a stacktrace. I just downloaded the logs directly from runpod This? {5 items "endpointId":"6oe3safoiwidj3" "workerId":"m07jdb658oetph" "level":"info" "message":"Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. " "dt":"2024-08-03 18:27:11.64919904" }
nerdylive
nerdylive4mo ago
Whats the application that you're using? that creates that
echoSplice
echoSplice4mo ago
we run stable diffusion with automatic1111
nerdylive
nerdylive4mo ago
so all worker runs the same specific model loras, etc?
echoSplice
echoSplice4mo ago
yes
nerdylive
nerdylive4mo ago
then all should be throwing that oom error if they load the same models so there should be some workers that is loading and unloading models dynamically and thats where you should find out from the application that your using
echoSplice
echoSplice4mo ago
we don have that functionality in our code. They should all load the very same way This specific worker had a 100% fail rate though.
nerdylive
nerdylive4mo ago
where's the code for loading the model tho i'll try to look at it try reporting it to runpod for now, but i guess this is more an application issue
echoSplice
echoSplice4mo ago
I'll send you the start.sh and handler script.
nerdylive
nerdylive4mo ago
maybe you didn't unload the models somewhere
echoSplice
echoSplice4mo ago
How? I dont see a file option in DM
nerdylive
nerdylive4mo ago
just the model loading code maybe a1111 loads model dynamically as far as i know
Marcus
Marcus4mo ago
Its an OOM issue, why are you using sdp attention and not xformers?
nerdylive
nerdylive4mo ago
what about it? xformers has lower vram usage?
Marcus
Marcus4mo ago
yes
echoSplice
echoSplice4mo ago
Ill try this in a new deployment. Just thought it was odd that just this one worker failed
Marcus
Marcus4mo ago
A1111 can fail intermittently with OOM errors based on your request. I experienced random/intermittent OOM and had to upgrade from 24GB to 48GB GPU tier.
echoSplice
echoSplice4mo ago
Thanks for that tipp
Want results from more Discord servers?
Add your server