One request = one worker
How can I configure my endpoint so that one request is equal to one worker, and one worker does not complete more than one request within a certain timeframe?
My workload is bursty and requires all of the workers to be available at once. However, my endpoint does not give that and takes a long time to start all the workers I need. In addition, workers are sometimes reused instead of creating a new instance which I do not want.
43 Replies
I think If you disable Flashboot then after each request it will be a cold boot. But, are you sure you want to do that? It will dramatically increase the time a job stays IN_QUEUE.
Only way I know to guarantee that you have X number of workers up ready to respond is to make them active.
I dont want it to be cold boot
Is it possible to have x workers active for a two minute period and then turn it off?
I tried with active but it takes a long time to turn all the workers on though
I need 50 workers to be available within 10 seconds
You can adjust your active and max workers... just click the hamburger icon on the endpoint and then click edit and you can adjust.
Have you tried using runpodctl or graphQL to provision your workers? Might have more control / speed with scripting rather than manual.
Beyond scripting, I think your boot up time will depend mainly on the size of your docker image.
Let me look into that, i didnt know serverless worked on graphql api too
I couldnt find any info on doing it programatically
Only serverless endpoints
serverless endpoint = worker
but my problem is serverless endpoint gets put into a queue
and doesnt always start a new worker
I think they will all have to be active
Do you have the ability to use 50 workers? I'm currently limited to 35.
i have requested further worker increases
sweet!
i mean i already have limit of more than 50 workers
cant figure out how to utilize all of them at the same time is the problem
if you want all of them starting without putting in jobs, easiest way is to change active worker to 50 then after few mins to reduce them, you can do this using our graphql api
why don't you want the worker to take the next job if the worker is free?
so my job has two components, a preparation time (my program needs to build kernel modules for the specific machine, takes about 20-30s) and a deadline which the workers will just shut down after because its no longer needed
i noticed that the current way it one worker finishes the preparation, it ends up taking all the requests in queue
which instantly error out because the deadline had passed already and those jobs are done
and this kills the other workers which is currently in progress of building those kernel modules because theres no more in queue waiting for them
i also noticed this behavior:
where workers are started sequentially rather than in parallel
scale type is already set to request count of 1, because i want one request to create a new worker
how can I make my workers start up in parallel?
im limited to about 50 workers starting within 20 seconds. because my workload is bursty, i want to have 75+ workers start up asynchronously within 10-15 seconds
https://docs.runpod.io/serverless/workers/handlers/handler-concurrency
Would this work for you? It means a single worker would handle multiple jobs. Iām not sure how heavy each job is, though. Also, which GPU are you using right now?
Concurrent Handlers | RunPod Documentation
RunPod's concurrency functionality enables efficient task handling through asynchronous requests, allowing a single worker to manage multiple tasks concurrently. The concurrency_modifier configures the worker's concurrency level to optimize resource consumption and performance.
I am using 4090s. This would not work, by the time a worker is done with one job the window to do the next would have already passed
Is there some sort of super high priority flag for a job request that I can send for it to bypass the queue and instantly spin up the worker?
https://docs.runpod.io/serverless/endpoints/send-requests#execution-policies
You can set job to low priority, not sure if this helps in your case
Send a request | RunPod Documentation
Learn how to construct a JSON request body to send to your custom endpoint, including optional inputs for webhooks, execution policies, and S3-compatible storage, to optimize job execution and resource management.
thats the opposite of what I want to do
so your using request count to scale? and when you need to scale, you submit 50-100 very quick and you need the scale to catch up quick? ill review this logic for requests scale
They want to spin up 50 workers as quickly as possible to process 50 requests and then shut them all down. They want to insure that each worker processes only 1 request each such that all 50 are run in parallel.
I think if you used the right scaling type on the endpoint and no concurrent settings on the handler it should do what you want and maybe the Runpod's scaling is abit slow ( not sure )
i did not make any adjustment to the concurrency settings. i have my workers scaling at 1 by request count
yes this is right, what Im thinking is that the request count scaling would mean that if I have 50 jobs in queue then it would spin up 50 workers at once
but from what I observed the current scaling, even with a request count scale of 1, takes time to start up
for 50 workers, it takes about 30 seconds
So the scale up is slow?
yea the scale up is slow
i want all 50 workers to be started up at the same time, so that 5 seconds after my batch of requests I have 50 workers ready
instead of 1 new worker every few seconds
is there a way to do it?
Yeah let's wait for staff here
i plan to review the scale algo and will get back to you
You could call that algo batch_mode LOL š
thank you :TOCheer:
looks like the scaling algo for request scale is serial, workers are started one after the other, ill plan optimizations for this in the coming week
our queue scale method is async, you can try that out and set queue delay scale time to lowest possible, 1 second, that will spin up much faster, please provide some feedback if you do decide to test that out @1AndOnlyPika
this odd situation happened:
there were requests stitting in queue, but no workers being spun up to handle them
the first few workers do start up much faster, but it never reaches 1 request = 1 worker
well it does but not evenly, the times end up looking like this
can you pm me endpoint id
sent thanks
yay looks like the fix was pushed out, thanks runpod!
Nice, How fast is it now
Getting all the workers I need at once started in < 2s if theyre ready
Ooh
That's great! Are you using GraphQL?
Eh graphql what for?
For starting his 50 workers simultaneously
Oh, I guess not but let's see
yep pushed the update an hour ago, its for serverless using request scale
no just the regular /run endpoint