Maximum queue size
Hi, is there a limit for maximum pending jobs in the queue with serverless endpoints or are there any other queue size limitations?
41 Replies
i don't think theres any
There is a limit actually
100 * max workers
https://discord.com/channels/912829806415085598/948767517332107274/1203014459204050944
Oohh
thx!
Ok, max workers = 100. Can I have 1000 API connection in the IN_QUEUE status? It's an exaggerated example but would like to know how many actual IN_QUEUE endpoint calls can be waiting? Say I have a serverless endpoint that receives 1000 calls but active workers is set to 0 and max workers is set to 1 and the process takes ~ 5 minutes. Can it process all 1000 calls, one at a time, while the remaining calls remain in the QUEUE?
100 x 100 = 10,000 not 1000.
Read the link to flash-singh's message above, TTL also comes into play.
Ah so it all comes down to 'Idle Timeout' setting? What is the largest value the 'Idle Timeout' can use? 86,400 seconds would be 24 hours.
No it doesn't
if You have only a single max worker, you can only have 100 requests in the queue
simple mathematics
OK so with 1 active worker I can have a maximum of 99 IN_QUEUE connections waiting?
You should never have only 1 max worker under any circumstances anyway
with 1 max worker I don't spend any $ until my endpoint is being used.
I am not sure how the formula works with active workers, I assume the formula actually counts all workers not just max workers
@flash-singh would need to confirm though because I am just guessing, and @PatrickR should probably also document this info somewhere.
Ok thanks for the info you provided. Hopefully one of the people you have tagged will chime in.
Probably need them both to chime in because this question about how many max requests can be in queue has come up more than once so will be great if it can be documented.
I would hope that items can remain in the QUEUE for some time. All they are tiny amounts of information (JSON). Not sure why they would want to time them out.
Wdym
Its already been made clear that they don't, and it depends on your workers
Why would you want thousands of requests in the queue, that makes no sense
Usually when you have a large number of requests in the queue, you need to add more max workers to process the requests, or else you have some issue with your handler
Yeap, and if your request takes a long time to process you can use pods btw
Even if it takes long, you can still use serverless, just make sure you don't try and use 1 max worker
I want to rip my eyes out when people have 1 max worker and complain that things are not working as expected
Chill
I honestly don't get it, its not like you pay for them like active workers, so why set it to 1
And if you set it higher than 1, RunPod also gives you FREE "additional workers" to help with throttling
So there is absolutely no reason whatsoever to ever set it to 1. I don't even set it to 1 for debugging.
I agree completely and I would do exactly that once I start getting users but before that I would like to save $ and not have active workers running when I have 0 users. But to understand what is possible in that initial phase I would need to know how many I can have in QUEUE. I would NEVER let there be 1000 in QUEUE but I could imagine a time where there is on average 10 or so in QUEUE.
Just scale up max workers and you don't have a problem. I have my endpoint in production and have 30 max workers, zero active workers and never have any issues unless RunPod has issues.
I use network storage and sometimes there are networking issues and weird other incidents in those data centers.
With 0 active workers and 1 max worker what happens is nothing runs until an endpoint is hit. Once that happens a new serverless is spun up and responds. After that it goes back idle. If I had 1 active worker I would be charged for that endpoint sitting there waiting for requests right?
Please dont ever set max workers to 1.
I have been trying to make this clear in all my messages above, just DON'T do it.
Other than wanting me to pay more what is the reason?
Yeah bro just use more max workers, if you want you can set the scale type to the other one ( not queue delay )
Pay what more?
It doesn't make you pay more
Max workers are free
RunPod sets the default to 3 for a reason
Serverless charges from your running time only
They actually shouldn't allow you to set it less than 3 in my opinion.
Max workers, yes! I see. Could have 0 active workers and max of 3. That makes more sense for prod. That way I can handle the connections and again only being charged when used.
Yeah..
Exactly, like I said, I have 0 active workers and 30 max workers in production.
I don't pay a cent for active workers, I only pay when my max workers kick in and handle requests.
Yes, I mis understood what you were saying. Right now it is just me setting up/testing so I never need more than 1 but 30 sounds good for max workers in production.
3 workers is also very few, if your app goes viral or something, you will have issues.
That really sheds light on the subject. With max workers set higher it really removes the concern, for me at least, about how many items can remain in QUEUE.
Btw if you wanna make sure, just add an extra storage that stores the information of jobs
It sounds like the confusion is over terms:
"Running" vs "Idle" -> A worker only costs while it is Running.
"Active" vs "Max" -> An active worker is "Always on shift" and so effectively always Running, but costs 40% less. Max workers are how many total might possibly be brought in to work. Max minus Active = Temp Workers, and they also are not costing anything unless they are Running.
When there is nothing to process - no queue at all - there is no worker Running, so no cost for the worker(s).
When the queue has ANYTHING in it, a worker will run - and cost money - to process the next thing in queue, up to the max number of workers.
If you intend to have a non-empty queue at all times, you should have enough "Active" workers to handle the normal load of the queue and cost the least. Then bigger loads will pull in "Temp workers" up to the Max count to handle the queue faster until it goes down.
thats accurate