RunPod•11mo ago

Maximum queue size

Hi, is there a limit for maximum pending jobs in the queue with serverless endpoints or are there any other queue size limitations?

46 Replies

Jason•11mo ago

i don't think theres any

digigoblin•11mo ago

There is a limit actually 100 * max workers https://discord.com/channels/912829806415085598/948767517332107274/1203014459204050944

Jason•11mo ago

Oohh

jvm-cbOP•11mo ago

thx!

Encyrption•11mo ago

Ok, max workers = 100. Can I have 1000 API connection in the IN_QUEUE status? It's an exaggerated example but would like to know how many actual IN_QUEUE endpoint calls can be waiting? Say I have a serverless endpoint that receives 1000 calls but active workers is set to 0 and max workers is set to 1 and the process takes ~ 5 minutes. Can it process all 1000 calls, one at a time, while the remaining calls remain in the QUEUE?

digigoblin•11mo ago

100 x 100 = 10,000 not 1000. Read the link to flash-singh's message above, TTL also comes into play.

Encyrption•11mo ago

Ah so it all comes down to 'Idle Timeout' setting? What is the largest value the 'Idle Timeout' can use? 86,400 seconds would be 24 hours.

digigoblin•11mo ago

No it doesn't if You have only a single max worker, you can only have 100 requests in the queue simple mathematics

Encyrption•11mo ago

OK so with 1 active worker I can have a maximum of 99 IN_QUEUE connections waiting?

digigoblin•11mo ago

You should never have only 1 max worker under any circumstances anyway

Encyrption•11mo ago

with 1 max worker I don't spend any $ until my endpoint is being used.

digigoblin•11mo ago

I am not sure how the formula works with active workers, I assume the formula actually counts all workers not just max workers @flash-singh would need to confirm though because I am just guessing, and @PatrickR should probably also document this info somewhere.

Encyrption•11mo ago

Ok thanks for the info you provided. Hopefully one of the people you have tagged will chime in.

digigoblin•11mo ago

Probably need them both to chime in because this question about how many max requests can be in queue has come up more than once so will be great if it can be documented.

Encyrption•11mo ago

I would hope that items can remain in the QUEUE for some time. All they are tiny amounts of information (JSON). Not sure why they would want to time them out.

Jason•11mo ago

Wdym

digigoblin•11mo ago

Its already been made clear that they don't, and it depends on your workers Why would you want thousands of requests in the queue, that makes no sense Usually when you have a large number of requests in the queue, you need to add more max workers to process the requests, or else you have some issue with your handler

Jason•11mo ago

Yeap, and if your request takes a long time to process you can use pods btw

digigoblin•11mo ago

Even if it takes long, you can still use serverless, just make sure you don't try and use 1 max worker I want to rip my eyes out when people have 1 max worker and complain that things are not working as expected

Jason•11mo ago

Chill

digigoblin•11mo ago

I honestly don't get it, its not like you pay for them like active workers, so why set it to 1 And if you set it higher than 1, RunPod also gives you FREE "additional workers" to help with throttling So there is absolutely no reason whatsoever to ever set it to 1. I don't even set it to 1 for debugging.

Encyrption•11mo ago

I agree completely and I would do exactly that once I start getting users but before that I would like to save $ and not have active workers running when I have 0 users. But to understand what is possible in that initial phase I would need to know how many I can have in QUEUE. I would NEVER let there be 1000 in QUEUE but I could imagine a time where there is on average 10 or so in QUEUE.

digigoblin•11mo ago

Just scale up max workers and you don't have a problem. I have my endpoint in production and have 30 max workers, zero active workers and never have any issues unless RunPod has issues. I use network storage and sometimes there are networking issues and weird other incidents in those data centers.

Encyrption•11mo ago

With 0 active workers and 1 max worker what happens is nothing runs until an endpoint is hit. Once that happens a new serverless is spun up and responds. After that it goes back idle. If I had 1 active worker I would be charged for that endpoint sitting there waiting for requests right?

digigoblin•11mo ago

Please dont ever set max workers to 1. I have been trying to make this clear in all my messages above, just DON'T do it.

Encyrption•11mo ago

Other than wanting me to pay more what is the reason?

Jason•11mo ago

Yeah bro just use more max workers, if you want you can set the scale type to the other one ( not queue delay )

digigoblin•11mo ago

Pay what more?

Jason•11mo ago

It doesn't make you pay more

digigoblin•11mo ago

Max workers are free RunPod sets the default to 3 for a reason

Jason•11mo ago

Serverless charges from your running time only

digigoblin•11mo ago

They actually shouldn't allow you to set it less than 3 in my opinion.

Encyrption•11mo ago

Max workers, yes! I see. Could have 0 active workers and max of 3. That makes more sense for prod. That way I can handle the connections and again only being charged when used.

Jason•11mo ago

Yeah..

digigoblin•11mo ago

Exactly, like I said, I have 0 active workers and 30 max workers in production. I don't pay a cent for active workers, I only pay when my max workers kick in and handle requests.

Encyrption•11mo ago

Yes, I mis understood what you were saying. Right now it is just me setting up/testing so I never need more than 1 but 30 sounds good for max workers in production.

digigoblin•11mo ago

3 workers is also very few, if your app goes viral or something, you will have issues.

Encyrption•11mo ago

That really sheds light on the subject. With max workers set higher it really removes the concern, for me at least, about how many items can remain in QUEUE.

Jason•11mo ago

Btw if you wanna make sure, just add an extra storage that stores the information of jobs

Charixfox•11mo ago

It sounds like the confusion is over terms: "Running" vs "Idle" -> A worker only costs while it is Running. "Active" vs "Max" -> An active worker is "Always on shift" and so effectively always Running, but costs 40% less. Max workers are how many total might possibly be brought in to work. Max minus Active = Temp Workers, and they also are not costing anything unless they are Running. When there is nothing to process - no queue at all - there is no worker Running, so no cost for the worker(s). When the queue has ANYTHING in it, a worker will run - and cost money - to process the next thing in queue, up to the max number of workers. If you intend to have a non-empty queue at all times, you should have enough "Active" workers to handle the normal load of the queue and cost the least. Then bigger loads will pull in "Temp workers" up to the Max count to handle the queue faster until it goes down.

flash-singh•10mo ago

thats accurate

WeamonZ•2w ago

Hi, sorry to bring up this thread again but... I see in the documentation that the max queue size is defined as: Queue size exceeds 50 jobs AND Queue size exceeds endpoint.WorkersMax * 500. How can it be both 50 and 500𝑥? Is 50 just a safety measure in case WorkersMax is set to 0? https://docs.runpod.io/serverless/references/operations#queue-limits

Jason•2w ago

I guess, what problem do you experience with this?

WeamonZ•2w ago

I sometimes need to queue a large number of jobs (over 100 per user), so I was considering setting up my own queuing system. But I guess simply increasing the max number of workers should do the trick 🙂

Eren•2w ago

>Is 50 just a safety measure in case WorkersMax is set to 0? Correct

flash-singh•2w ago

yes

Gaming

Programming

Maximum queue size

Did you find this page helpful?