RunPod•9mo ago

worker keeps dying while training a lora model

even after setting the worker to be active, it keeps dying after like 2 minutes. is there a way to prevent this?

8 Replies

Jason•9mo ago

Hmm yeah i wonder if this is normal, and idle timeout seems not to work, being active as supposed to

shawtyisatenOP•8mo ago

removing execution timeout fixed it

Jason•8mo ago

@Tim aka NERDDISCO this maybe a bug in runpod

NERDDISCO•8mo ago

@shawtyisaten would you mind providing the endpoint id or some more info about the used docker image?

shawtyisatenOP•8mo ago

I'm not sure if it's a bug. i think it worked as intended as i set the execution timeout. endpoint id is z398ywur6g1041. docker image is custom one i made for training a flux lora model. i just thought it was unexpected because i don't remember checking that box. i think it's checked by default when you create a worker

yhlong00000•8mo ago

This behavior is intentional. The execution timeout is designed to prevent a worker from running indefinitely, which could happen if there’s a bug in the code or a long-running process that could potentially drain all your credits.

Jason•8mo ago

ohh

NERDDISCO•8mo ago

@yhlong00000 thanks for the clarification!

Gaming

Programming

worker keeps dying while training a lora model

Did you find this page helpful?