AI Toolkit Lora Training torch.OutOfMemoryError
I've tried different pods for Flux Lora Training on AI Toolkit and couldn't get any luck at all.
I even used 2 x RTX 4090 24 vCPU 62GB RAM and it was also reporting “torch.OutOfMemoryError”. How could that be???
The RTX 6000 Ada 48 GB VRAM 188GB RAM 24 vCPU could start the training process but it took more than 10 minutes (!!) to generate sample image and it was practically not visible (denoise <0.2). How's that?
Template: runpod/pytorch:2.2.0-py3.10-cuda12.1.1-devel-ubuntu22.04 (officially recommended)
Optimized: adamw
Model: Flux.Dev
low_vram: false
quantize: false
2 Replies
could be your training app not support multi gpu
Indeed. It does't support muli gpu. Hhhh