worker keeps dying while training a lora model

even after setting the worker to be active, it keeps dying after like 2 minutes. is there a way to prevent this?
8 Replies
nerdylive
nerdylive4mo ago
Hmm yeah i wonder if this is normal, and idle timeout seems not to work, being active as supposed to
shawtyisaten
shawtyisatenOP4mo ago
removing execution timeout fixed it
nerdylive
nerdylive4mo ago
@Tim aka NERDDISCO this maybe a bug in runpod
NERDDISCO
NERDDISCO4mo ago
@shawtyisaten would you mind providing the endpoint id or some more info about the used docker image?
shawtyisaten
shawtyisatenOP4mo ago
I'm not sure if it's a bug. i think it worked as intended as i set the execution timeout. endpoint id is z398ywur6g1041. docker image is custom one i made for training a flux lora model. i just thought it was unexpected because i don't remember checking that box. i think it's checked by default when you create a worker
yhlong00000
yhlong000004mo ago
This behavior is intentional. The execution timeout is designed to prevent a worker from running indefinitely, which could happen if there’s a bug in the code or a long-running process that could potentially drain all your credits.
nerdylive
nerdylive4mo ago
ohh
NERDDISCO
NERDDISCO4mo ago
@yhlong00000 thanks for the clarification!
Want results from more Discord servers?
Add your server