Auto-scaling issues with A1111
Hey, I'm running an A1111 worker (https://github.com/ashleykleynhans/runpod-worker-a1111) on Serverless but there is an issue with auto-scaling.
The problem is that the newly added worker becomes available (green) before the A1111 has been booted. Because of this, new requests are being instantly sent to a new worker, and older workers are being shut down if they haven't received any requests during 5 seconds. This usually results in all active workers shutdown, and a long queue build up because all newly added workers haven't booted the A1111 yet.
I tried to increase the idle timeout, e.g. to 180 seconds but in this case the workers never scale down.
Questions:
1. How to make the worker available (green) only once the A1111 has been booted?
2. Is it possible to remove the worker also based on the queue delay setting? E.g. if a request waits in the queue less than 10 seconds, 1 worker is removed.
2 Replies
You can use idle timeout setting
im not best with serverless scalling so you might better send ticket on website
The exact scenario you are describing may not be possible, but afaik, I think you might need to find some kind of "sweet" spot here.
with the idle timeout vs the queue delay setting
you can also do this programmatically on ur end i believe
by pinging the "ready" endpoint, or whatever that thing is called
and when it's ready, then u know it's available
a bit annoying, but cold starts are always problematic in this case.
U can run smth else or smth more custom and barebones to help you reduce the coldstart and just leave what you need from the webui, assuming it's probs it's API endpoints.