24 GB VRAM is not enough for simple kohya_ss LORA generation.
How come 24GB VRAM is not enough for generating simple Lora in kohya_ss?
I've tried running it with the simplest configuration: 32 pictures, fp16, AdamW8bit, no batching, or other demanding features, cuda constantly runs out of memory. I've tried launching it 4-5 times, clearing the cache, setting the limit in PyTorch as well as making sure nothing else is using VRAM. It still runs out of memory every time.
The funniest part is that I've successfully launched the same configuration on my old PC with GTX 960 4GB, it is slow, but it does not run out of VRAM. Why pods here can't handle it?
I ended up running it on a 48GB VRAM instance and it uses around 33 GB of it. Why my 4GB card can run it with pretty much the same config? Is it possible to achieve the same result here?
7 Replies
Log issue in Kohya_ss repo, this is not a RunPod issue.
Do you know where I can learn more information about this problem?
Which template are you using?
Oh, you mean issue in the template
I tried using both this and this. They have same author, so they might have same problems
No, I didn't say that, I just asked which template you are using.
For the Ultimate one, you have to connect to port 8000 and stop A1111 before training Kohya_ss.
Also if you are training SDXL, people use Adafactor not 8 bit Adam.
Yeah I'd read about it, and I did. But that didn't change anything though
I wasn't training SDXL, at that point I was just trying to run the simplest config I could to understand why 24 GB VRAM running out