RunPod•13mo ago

Auto-scaling issues with A1111

Hey, I'm running an A1111 worker (https://github.com/ashleykleynhans/runpod-worker-a1111) on Serverless but there is an issue with auto-scaling. The problem is that the newly added worker becomes available (green) before the A1111 has been booted. Because of this, new requests are being instantly sent to a new worker, and older workers are being shut down if they haven't received any requests during 5 seconds. This usually results in all active workers shutdown, and a long queue build up because all newly added workers haven't booted the A1111 yet. I tried to increase the idle timeout, e.g. to 180 seconds but in this case the workers never scale down. Questions: 1. How to make the worker available (green) only once the A1111 has been booted? 2. Is it possible to remove the worker also based on the queue delay setting? E.g. if a request waits in the queue less than 10 seconds, 1 worker is removed.

2 Replies

Madiator2011•13mo ago

You can use idle timeout setting im not best with serverless scalling so you might better send ticket on website

AMooMoo•13mo ago

The exact scenario you are describing may not be possible, but afaik, I think you might need to find some kind of "sweet" spot here. with the idle timeout vs the queue delay setting you can also do this programmatically on ur end i believe by pinging the "ready" endpoint, or whatever that thing is called and when it's ready, then u know it's available a bit annoying, but cold starts are always problematic in this case. U can run smth else or smth more custom and barebones to help you reduce the coldstart and just leave what you need from the webui, assuming it's probs it's API endpoints.

Gaming

Programming

Auto-scaling issues with A1111

Did you find this page helpful?