Why is my serverless endpoint requests waiting in queue when theres free workers?
This has been happening,when two people try to make a request at the same time, the second users request will wait in queue until the first request is completed instead of trying to use another worker. I have 4 workers avaliable on my endpoint so thats not the issue. I set the queue delay to 1 second because thats the lowest possible but it doesn't do anything. Is the serverless endpoint suppose to work in production?
5 Replies
for more context im only trying to load an embedding model
pm me your endpoint id
@flash-singh this is a screen shot of the logs of each worker when 3 requests were made at the same time, the third request ended up waiting 30 seconds before it sent an api request to deepseek even though it started up. I switched the auto scaling type to request count and set it to 1 as well. any ideas?
pm me endpoint
sent it