R
RunPod3w ago
Fluxz

Trouble training sdxl lora with kohya

It seems as if its getting stuck in the process, anyone else having the same issues?
7 Replies
Fluxz
FluxzOP3w ago
I am using a 4090
Jason
Jason3w ago
any errors? any other details? maybe share the logs
Fluxz
FluxzOP3w ago
0250319-160845.toml 16:08:45-110488 INFO Command executed. 2025-03-19 16:08:50.583194: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2025-03-19 16:08:50.583238: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2025-03-19 16:08:50.584396: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2025-03-19 16:08:50.590100: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. this is where it gets stuck
RollyPolly
RollyPolly3w ago
Seen similar errors on cuda version mismatch between your template or docker image and the GPU Check if your script can recognize the gpu as cuda device, it may be defaulting to cpu
Fluxz
FluxzOP3w ago
Im kind of green in this area, how would I do that?
RollyPolly
RollyPolly3w ago
I can’t give concrete instructions idk if you’re on a template or image or what, Google is your friend, it’s well documented and common problem
Jason
Jason3w ago
ya maybe your docker image must be built with cuda's base image from nvidia and your package tensorflow there should be the gpu version

Did you find this page helpful?