RunPod•15mo ago

24 GB VRAM is not enough for simple kohya_ss LORA generation.

How come 24GB VRAM is not enough for generating simple Lora in kohya_ss? I've tried running it with the simplest configuration: 32 pictures, fp16, AdamW8bit, no batching, or other demanding features, cuda constantly runs out of memory. I've tried launching it 4-5 times, clearing the cache, setting the limit in PyTorch as well as making sure nothing else is using VRAM. It still runs out of memory every time. The funniest part is that I've successfully launched the same configuration on my old PC with GTX 960 4GB, it is slow, but it does not run out of VRAM. Why pods here can't handle it? I ended up running it on a 48GB VRAM instance and it uses around 33 GB of it. Why my 4GB card can run it with pretty much the same config? Is it possible to achieve the same result here?

7 Replies

ashleyk•15mo ago

Log issue in Kohya_ss repo, this is not a RunPod issue.

Andrew_RocketOP•15mo ago

Do you know where I can learn more information about this problem?

ashleyk•15mo ago

Which template are you using?

Andrew_RocketOP•15mo ago

Oh, you mean issue in the template

Andrew_RocketOP•15mo ago

I tried using both this and this. They have same author, so they might have same problems

ashleyk•15mo ago

No, I didn't say that, I just asked which template you are using. For the Ultimate one, you have to connect to port 8000 and stop A1111 before training Kohya_ss. Also if you are training SDXL, people use Adafactor not 8 bit Adam.

Andrew_RocketOP•15mo ago

Yeah I'd read about it, and I did. But that didn't change anything though I wasn't training SDXL, at that point I was just trying to run the simplest config I could to understand why 24 GB VRAM running out

Gaming

Programming

24 GB VRAM is not enough for simple kohya_ss LORA generation.

Did you find this page helpful?