Not using cached worker
I've been running into this problem for several days now. I have a endpoint that runs a forge webui worker with a network volume attached. And as you know forge takes some time to start and only then generates the image. So generally when I send a request to a worker it takes some delay for the start process then generates images.
But recently I've run into an issue where there is already a worker running with webui forge started and ready to accept requests but when I submit a new request it completely starts a new worker, which results in huge delay times. My question is, why isn't it using the already available worker which has forge loaded?
And no, the requests weren't submitted one after the other so there is no reason to start a new worker
13 Replies
for anyone who needs some clarification of the image the logs are from the worker highlighted in grey color. It already has forge loaded. But as you can see there is a completely new worker running, ignoring the worker that has forge loaded
Is it like 1 second difference or it loaded like fron a long time ago?
Maybe the flashboot doesn't work everytime and keep it "warm" for a long time if your request are sporadic, or it's your endpoint scaling type that starts a new worker
This is my endpoint scaling type, should I change anything?
a long time. For example if I submit a request at 5:00PM it loads forge and run it, and then if I submit again at about 5:03PM instead of running the request in the worker that has loaded forge it runs it on a new worker
Yeah maybe try request count, because if a job sits in the queue for more than 4 sec while other is still processing it'll run a new one
But the worker is ready to receive request or it's running processing?
yes the worker that has forge loaded is still available, has no requests running. I can check it from the logs and the logs say it has forge loaded
Oh then it has unloaded
Or maybe Runpod's system is that way, sometimes
Try setting idle timeout to 3 minutes and 30 seconds
Or 4 minutes
If you want to make sure that happens
I'll try that I wanted to keep costs reduced but I don't want unavailable workers and long startup times either
also is there a way to cancel a job if delay is too high?
currently I manually cancelled these but it would be nice if there is an automated way to do it
unfortunately Enable Execution Timeout doesnt work for delay times only execution times
Maybe from your client? Add a logic that way
If you poll status then it's still in queue for x seconds just cancel on the job
Or just have requests coming in
thanks I just looked at the documentation I think I'll be able to do that
btw for image generation tasks what is best - run or run_sync?
Yup
Run I guess
Always run