R
RunPod6d ago
jvm-cb

Maximum queue size

Hi, is there a limit for maximum pending jobs in the queue with serverless endpoints or are there any other queue size limitations?
40 Replies
nerdylive
nerdylive6d ago
i don't think theres any
nerdylive
nerdylive6d ago
Oohh
jvm-cb
jvm-cb6d ago
thx!
Encyrption
Encyrption6d ago
Ok, max workers = 100. Can I have 1000 API connection in the IN_QUEUE status? It's an exaggerated example but would like to know how many actual IN_QUEUE endpoint calls can be waiting? Say I have a serverless endpoint that receives 1000 calls but active workers is set to 0 and max workers is set to 1 and the process takes ~ 5 minutes. Can it process all 1000 calls, one at a time, while the remaining calls remain in the QUEUE?
digigoblin
digigoblin6d ago
100 x 100 = 10,000 not 1000. Read the link to flash-singh's message above, TTL also comes into play.
Encyrption
Encyrption6d ago
Ah so it all comes down to 'Idle Timeout' setting? What is the largest value the 'Idle Timeout' can use? 86,400 seconds would be 24 hours.
digigoblin
digigoblin6d ago
No it doesn't if You have only a single max worker, you can only have 100 requests in the queue simple mathematics
Encyrption
Encyrption6d ago
OK so with 1 active worker I can have a maximum of 99 IN_QUEUE connections waiting?
digigoblin
digigoblin6d ago
You should never have only 1 max worker under any circumstances anyway
Encyrption
Encyrption6d ago
with 1 max worker I don't spend any $ until my endpoint is being used.
digigoblin
digigoblin6d ago
I am not sure how the formula works with active workers, I assume the formula actually counts all workers not just max workers @flash-singh would need to confirm though because I am just guessing, and @PatrickR should probably also document this info somewhere.
Encyrption
Encyrption6d ago
Ok thanks for the info you provided. Hopefully one of the people you have tagged will chime in.
digigoblin
digigoblin6d ago
Probably need them both to chime in because this question about how many max requests can be in queue has come up more than once so will be great if it can be documented.
Encyrption
Encyrption5d ago
I would hope that items can remain in the QUEUE for some time. All they are tiny amounts of information (JSON). Not sure why they would want to time them out.
nerdylive
nerdylive5d ago
Wdym
digigoblin
digigoblin5d ago
Its already been made clear that they don't, and it depends on your workers Why would you want thousands of requests in the queue, that makes no sense Usually when you have a large number of requests in the queue, you need to add more max workers to process the requests, or else you have some issue with your handler
nerdylive
nerdylive5d ago
Yeap, and if your request takes a long time to process you can use pods btw
digigoblin
digigoblin5d ago
Even if it takes long, you can still use serverless, just make sure you don't try and use 1 max worker I want to rip my eyes out when people have 1 max worker and complain that things are not working as expected
nerdylive
nerdylive5d ago
Chill
digigoblin
digigoblin5d ago
I honestly don't get it, its not like you pay for them like active workers, so why set it to 1 And if you set it higher than 1, RunPod also gives you FREE "additional workers" to help with throttling So there is absolutely no reason whatsoever to ever set it to 1. I don't even set it to 1 for debugging.
Encyrption
Encyrption5d ago
I agree completely and I would do exactly that once I start getting users but before that I would like to save $ and not have active workers running when I have 0 users. But to understand what is possible in that initial phase I would need to know how many I can have in QUEUE. I would NEVER let there be 1000 in QUEUE but I could imagine a time where there is on average 10 or so in QUEUE.
digigoblin
digigoblin5d ago
Just scale up max workers and you don't have a problem. I have my endpoint in production and have 30 max workers, zero active workers and never have any issues unless RunPod has issues. I use network storage and sometimes there are networking issues and weird other incidents in those data centers.
Encyrption
Encyrption5d ago
With 0 active workers and 1 max worker what happens is nothing runs until an endpoint is hit. Once that happens a new serverless is spun up and responds. After that it goes back idle. If I had 1 active worker I would be charged for that endpoint sitting there waiting for requests right?
digigoblin
digigoblin5d ago
Please dont ever set max workers to 1. I have been trying to make this clear in all my messages above, just DON'T do it.
Encyrption
Encyrption5d ago
Other than wanting me to pay more what is the reason?
nerdylive
nerdylive5d ago
Yeah bro just use more max workers, if you want you can set the scale type to the other one ( not queue delay )
digigoblin
digigoblin5d ago
Pay what more?
nerdylive
nerdylive5d ago
It doesn't make you pay more
digigoblin
digigoblin5d ago
Max workers are free RunPod sets the default to 3 for a reason
nerdylive
nerdylive5d ago
Serverless charges from your running time only
digigoblin
digigoblin5d ago
They actually shouldn't allow you to set it less than 3 in my opinion.
Encyrption
Encyrption5d ago
Max workers, yes! I see. Could have 0 active workers and max of 3. That makes more sense for prod. That way I can handle the connections and again only being charged when used.
nerdylive
nerdylive5d ago
Yeah..
digigoblin
digigoblin5d ago
Exactly, like I said, I have 0 active workers and 30 max workers in production. I don't pay a cent for active workers, I only pay when my max workers kick in and handle requests.
Encyrption
Encyrption5d ago
Yes, I mis understood what you were saying. Right now it is just me setting up/testing so I never need more than 1 but 30 sounds good for max workers in production.
digigoblin
digigoblin5d ago
3 workers is also very few, if your app goes viral or something, you will have issues.
Encyrption
Encyrption5d ago
That really sheds light on the subject. With max workers set higher it really removes the concern, for me at least, about how many items can remain in QUEUE.
nerdylive
nerdylive5d ago
Btw if you wanna make sure, just add an extra storage that stores the information of jobs
Charixfox
Charixfox5d ago
It sounds like the confusion is over terms: "Running" vs "Idle" -> A worker only costs while it is Running. "Active" vs "Max" -> An active worker is "Always on shift" and so effectively always Running, but costs 40% less. Max workers are how many total might possibly be brought in to work. Max minus Active = Temp Workers, and they also are not costing anything unless they are Running. When there is nothing to process - no queue at all - there is no worker Running, so no cost for the worker(s). When the queue has ANYTHING in it, a worker will run - and cost money - to process the next thing in queue, up to the max number of workers. If you intend to have a non-empty queue at all times, you should have enough "Active" workers to handle the normal load of the queue and cost the least. Then bigger loads will pull in "Temp workers" up to the Max count to handle the queue faster until it goes down.