R
RunPod10mo ago
antoniog

Auto-scaling issues with A1111

Hey, I'm running an A1111 worker (https://github.com/ashleykleynhans/runpod-worker-a1111) on Serverless but there is an issue with auto-scaling. The problem is that the newly added worker becomes available (green) before the A1111 has been booted. Because of this, new requests are being instantly sent to a new worker, and older workers are being shut down if they haven't received any requests during 5 seconds. This usually results in all active workers shutdown, and a long queue build up because all newly added workers haven't booted the A1111 yet. I tried to increase the idle timeout, e.g. to 180 seconds but in this case the workers never scale down. Questions: 1. How to make the worker available (green) only once the A1111 has been booted? 2. Is it possible to remove the worker also based on the queue delay setting? E.g. if a request waits in the queue less than 10 seconds, 1 worker is removed.
2 Replies
Madiator2011
Madiator201110mo ago
You can use idle timeout setting im not best with serverless scalling so you might better send ticket on website
AMooMoo
AMooMoo10mo ago
The exact scenario you are describing may not be possible, but afaik, I think you might need to find some kind of "sweet" spot here. with the idle timeout vs the queue delay setting you can also do this programmatically on ur end i believe by pinging the "ready" endpoint, or whatever that thing is called and when it's ready, then u know it's available a bit annoying, but cold starts are always problematic in this case. U can run smth else or smth more custom and barebones to help you reduce the coldstart and just leave what you need from the webui, assuming it's probs it's API endpoints.

Did you find this page helpful?