Why is my serverless endpoint requests waiting in queue when theres free workers?

This has been happening,when two people try to make a request at the same time, the second users request will wait in queue until the first request is completed instead of trying to use another worker. I have 4 workers avaliable on my endpoint so thats not the issue. I set the queue delay to 1 second because thats the lowest possible but it doesn't do anything. Is the serverless endpoint suppose to work in production?
5 Replies
gego144
gego144OP2w ago
for more context im only trying to load an embedding model
flash-singh
flash-singh2w ago
pm me your endpoint id
gego144
gego144OP2w ago
@flash-singh this is a screen shot of the logs of each worker when 3 requests were made at the same time, the third request ended up waiting 30 seconds before it sent an api request to deepseek even though it started up. I switched the auto scaling type to request count and set it to 1 as well. any ideas?
No description
flash-singh
flash-singh7d ago
pm me endpoint
gego144
gego144OP7d ago
sent it
Want results from more Discord servers?
Add your server