Trouble training sdxl lora with kohya
It seems as if its getting stuck in the process, anyone else having the same issues?
7 Replies
I am using a 4090
any errors? any other details?
maybe share the logs
0250319-160845.toml
16:08:45-110488 INFO Command executed.
2025-03-19 16:08:50.583194: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-03-19 16:08:50.583238: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-03-19 16:08:50.584396: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-03-19 16:08:50.590100: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
this is where it gets stuck
Seen similar errors on cuda version mismatch between your template or docker image and the GPU
Check if your script can recognize the gpu as cuda device, it may be defaulting to cpu
Im kind of green in this area, how would I do that?
I can’t give concrete instructions idk if you’re on a template or image or what, Google is your friend, it’s well documented and common problem
ya maybe your docker image must be built with cuda's base image from nvidia
and your package tensorflow there should be the gpu version