Worker Errors Out When Sending Simultaneous Requests
I was benchmarking a serverless endpoint by sending 10 simultaneous requests to the endpoint that has two active workers and one of the workers keeps errors out with the attached stack trace.
After this error happens I get 9 requests that become stuck
In Progress
and if I terminate the errored out worker and spin up a new one I get the same stack trace unless I manually clear out the In Progress
requests.
This endpoint is using a Llama2 70B model with image runpod/worker-vllm:0.2.3
Solution:Jump to solution
Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.
3 Replies
Here is the error stack
Solution
Figured my issue out. I needed MAX_CONCURRENCY set to 5, otherwise all requests were going only to one node.