antoniog
RRunPod
•Created by antoniog on 4/2/2024 in #⚡|serverless
Auto-scaling issues with A1111
Hey, I'm running an A1111 worker (https://github.com/ashleykleynhans/runpod-worker-a1111) on Serverless but there is an issue with auto-scaling.
The problem is that the newly added worker becomes available (green) before the A1111 has been booted. Because of this, new requests are being instantly sent to a new worker, and older workers are being shut down if they haven't received any requests during 5 seconds. This usually results in all active workers shutdown, and a long queue build up because all newly added workers haven't booted the A1111 yet.
I tried to increase the idle timeout, e.g. to 180 seconds but in this case the workers never scale down.
Questions:
1. How to make the worker available (green) only once the A1111 has been booted?
2. Is it possible to remove the worker also based on the queue delay setting? E.g. if a request waits in the queue less than 10 seconds, 1 worker is removed.
10 replies
RRunPod
•Created by antoniog on 12/22/2023 in #⚡|serverless
Issue with Request Count Scale Type
11 replies
RRunPod
•Created by antoniog on 12/20/2023 in #⚡|serverless
Issues with building the new `worker-vllm` Docker Image
I've been using the previous version of
worker-vllm
with the awq
model in production, and it recently turned out that there are problems with scaling it (all the requests are being sent to the one worker).
I've tried the newest version of the worker-vllm
. It works when using a pre-built Docker Image but I need to build a custom Docker Image with a slightly modified vllm
(there's one minor update that negatively affects the quality of outputs).
Unfortunately, there are issues when building a Docker Image (even without any modifications).
There are already 3 issues related to that on GitHub:
https://github.com/runpod-workers/worker-vllm/issues/21#issuecomment-1862188983
https://github.com/runpod-workers/worker-vllm/issues/25
https://github.com/runpod-workers/worker-vllm/issues/26
Could you, please, take a look on it? Or provide with a solution for scaling the previous version of worker-vllm
? Thanks in advance!6 replies
RRunPod
•Created by antoniog on 12/19/2023 in #⚡|serverless
How to build worker-vllm Docker Image without a model inside?
I would like to build a worker-vllm with slightly customized vLLM. However, I don't want to build it with a model inside. Basically, it should be the same as your Pre-Built Docker Image that would download the model to the Network Volume. Thanks!
https://github.com/runpod-workers/worker-vllm/
9 replies
RRunPod
•Created by antoniog on 12/19/2023 in #⚡|serverless
Issue with worker-vllm and multiple workers
I'm using the previous version of the worker-vllm (https://github.com/runpod-workers/worker-vllm/tree/4f792062aaea02c526ee906979925b447811ef48). There is an issue when more than 1 workers are running. Since vLLM has internal queue, all the requests are being immediately passed to the one worker. The second worker doesn't receive any requests. It it possible to solve it? I've tried a new version of the worker-vllm but there are some other issues. Thanks!
13 replies