RunPod•10mo ago

Why "CUDA out of memory" Today ? Same image to generate portrait, yesterday is ok , today in not.

"delayTime": 133684, "error": "CUDA out of memory. Tried to allocate 1.50 GiB (GPU 0; 23.68 GiB total capacity; 18.84 GiB already allocated; 1.47 GiB free; 20.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF", "executionTime": 45263, "id": "ae1e4066-e2b7-43c1-8f37-3525bda03893-e1",

32 Replies

Marcus•10mo ago

Ask the developer of the application, it has nothing to do with RunPod.

Jason•10mo ago

seems like out of memory error meaning you need a bigger gpu for that or try unloading your other models somehow

fireiceOP•10mo ago

I am the developer. When I use my ai app, I get CUDA out of memory. I did nothing to the app.

Marcus•10mo ago

Then it needs a larger GPU as nerdy said.

Encyrption•10mo ago

It looks like you are trying to use a 24GB GPU when you need more VRAM. Try to run it on a 48GB GPU. If that is still not enough then try to run it on a 80GB GPU.

fireiceOP•10mo ago

OK, I see, I will test.

echoSplice•9mo ago

I have exactly the same problem. We have changed nothing in our setup. Just today most image generation fails

echoSplice•9mo ago

I have a second serverless endpoint running that uses the same template. that one is running fine

Jason•9mo ago

how does your setup work? does it unloads model? what models are loaded in the vram, maybe too much model is loaded

echoSplice•9mo ago

I have just realised this only happend on one specific worker: m07jdb658oetph thats why not all of the generations failed and my other endpoint runs fine

Jason•9mo ago

Interesting, when it happens, try collect traceback or logs

echoSplice•9mo ago

I have not switched it back on. But I can give you the logs from the weekend when it happened

echoSplice•9mo ago

logs-Indie_productio...

Jason•9mo ago

Any stacktrace? maybe somewhere here:

2024-08-03  20:33:07.360 | info | m07jdb658oetph | ', memory monitor disabled

2024-08-03  20:33:07.360 | info | m07jdb658oetph | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-08-03  20:33:07.360 | info | m07jdb658oetph | For debugging consider passing CUDA_LAUNCH_BLOCKING=1

2024-08-03  20:33:07.360 | info | m07jdb658oetph | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

2024-08-03  20:33:07.360 | info | m07jdb658oetph | Warning: caught exception 'CUDA error: out of memory

2024-08-03  20:33:07.360 | info | m07jdb658oetph | ', memory monitor disabled

2024-08-03  20:33:07.360 | info | m07jdb658oetph | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

2024-08-03  20:33:07.360 | info | m07jdb658oetph | For debugging consider passing CUDA_LAUNCH_BLOCKING=1

2024-08-03  20:33:07.360 | info | m07jdb658oetph | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

2024-08-03  20:33:07.360 | info | m07jdb658oetph | Warning: caught exception 'CUDA error: out of memory

echoSplice•9mo ago

sorry, not sure how I would get a stacktrace. I just downloaded the logs directly from runpod This?

{5 items
"endpointId":"6oe3safoiwidj3"
"workerId":"m07jdb658oetph"
"level":"info"
"message":"Compile with

TORCH_USE_CUDA_DSA

 to enable device-side assertions. "
"dt":"2024-08-03 18:27:11.64919904"
}

Jason•9mo ago

Whats the application that you're using? that creates that

echoSplice•9mo ago

we run stable diffusion with automatic1111

Jason•9mo ago

so all worker runs the same specific model loras, etc?

echoSplice•9mo ago

yes

Jason•9mo ago

then all should be throwing that oom error if they load the same models so there should be some workers that is loading and unloading models dynamically and thats where you should find out from the application that your using

echoSplice•9mo ago

we don have that functionality in our code. They should all load the very same way This specific worker had a 100% fail rate though.

Jason•9mo ago

where's the code for loading the model tho i'll try to look at it try reporting it to runpod for now, but i guess this is more an application issue

echoSplice•9mo ago

I'll send you the start.sh and handler script.

Jason•9mo ago

maybe you didn't unload the models somewhere

echoSplice•9mo ago

How? I dont see a file option in DM

Jason•9mo ago

just the model loading code maybe a1111 loads model dynamically as far as i know

Marcus•9mo ago

Its an OOM issue, why are you using sdp attention and not xformers?

Jason•9mo ago

what about it? xformers has lower vram usage?

Marcus•9mo ago

yes

echoSplice•9mo ago

Ill try this in a new deployment. Just thought it was odd that just this one worker failed

Marcus•9mo ago

A1111 can fail intermittently with OOM errors based on your request. I experienced random/intermittent OOM and had to upgrade from 24GB to 48GB GPU tier.

echoSplice•9mo ago

Thanks for that tipp

Gaming

Programming

Why "CUDA out of memory" Today ? Same image to generate portrait, yesterday is ok , today in not.

Did you find this page helpful?