R
RunPod11mo ago
KarMa

How to use multiple GPUs for Kohya Training?

I am using 2x gpus for training using Kohya(Dreambooth). But it is not utilizing both the gpus and instead only 1 gpu is being utilized. I have tried following solutions: 1. added "set CUDA_VISIBLE_DEVICES=1" to gui.bat file of kohya_ss 2. added the parameters(hardcoded): "aceelerate launch --num_processes=[NUM_YOUR_GPUS_PER_MACHINE] --num_machines=[NUM_YOUR_INDEPENDENT_MACHINES] --multi_gpus --gpu_ids=[GPU_IDS] args..." in the run_cmd of dreambooth_gui.py 3. Changed the config files where I set num_processes = 2 Still somehow when I restart the training, only 1 gpu is being utilized. How can I solve this to use multiple gpu for training? p.s. docker image I am using: ashleykza/stable-diffusion-webui:3.12.0
2 Replies
ashleyk
ashleyk11mo ago
RunPod is using Linux not Windows, you have to edit .sh files NOT .bat files. For everything else, I suggest logging a Github issue in the Kohya_ss repo: https://github.com/bmaltais/kohya_ss
GitHub
GitHub - bmaltais/kohya_ss
Contribute to bmaltais/kohya_ss development by creating an account on GitHub.
KarMa
KarMaOP11mo ago
Can I know where is the nccl.conf file located in runpod? I want to edit some environment variables and found this in Nvidia docs: "NCCL has an extensive set of environment variables to tune for specific usage. They can also be set statically in /etc/nccl.conf (for an administrator to set system-wide values) or in ~/.nccl.conf (for users). " But couldn't find it at either places. Ref: https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html
Want results from more Discord servers?
Add your server