R
RunPodā€¢3mo ago
1AndOnlyPika

One request = one worker

How can I configure my endpoint so that one request is equal to one worker, and one worker does not complete more than one request within a certain timeframe? My workload is bursty and requires all of the workers to be available at once. However, my endpoint does not give that and takes a long time to start all the workers I need. In addition, workers are sometimes reused instead of creating a new instance which I do not want.
43 Replies
Encyrption
Encyrptionā€¢3mo ago
I think If you disable Flashboot then after each request it will be a cold boot. But, are you sure you want to do that? It will dramatically increase the time a job stays IN_QUEUE. Only way I know to guarantee that you have X number of workers up ready to respond is to make them active.
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
I dont want it to be cold boot Is it possible to have x workers active for a two minute period and then turn it off? I tried with active but it takes a long time to turn all the workers on though I need 50 workers to be available within 10 seconds
Encyrption
Encyrptionā€¢3mo ago
You can adjust your active and max workers... just click the hamburger icon on the endpoint and then click edit and you can adjust. Have you tried using runpodctl or graphQL to provision your workers? Might have more control / speed with scripting rather than manual. Beyond scripting, I think your boot up time will depend mainly on the size of your docker image.
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
Let me look into that, i didnt know serverless worked on graphql api too I couldnt find any info on doing it programatically Only serverless endpoints
Encyrption
Encyrptionā€¢3mo ago
serverless endpoint = worker
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
but my problem is serverless endpoint gets put into a queue and doesnt always start a new worker
Encyrption
Encyrptionā€¢3mo ago
I think they will all have to be active Do you have the ability to use 50 workers? I'm currently limited to 35.
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
i have requested further worker increases
Encyrption
Encyrptionā€¢3mo ago
sweet!
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
i mean i already have limit of more than 50 workers cant figure out how to utilize all of them at the same time is the problem
flash-singh
flash-singhā€¢3mo ago
if you want all of them starting without putting in jobs, easiest way is to change active worker to 50 then after few mins to reduce them, you can do this using our graphql api why don't you want the worker to take the next job if the worker is free?
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
so my job has two components, a preparation time (my program needs to build kernel modules for the specific machine, takes about 20-30s) and a deadline which the workers will just shut down after because its no longer needed i noticed that the current way it one worker finishes the preparation, it ends up taking all the requests in queue which instantly error out because the deadline had passed already and those jobs are done and this kills the other workers which is currently in progress of building those kernel modules because theres no more in queue waiting for them
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
i also noticed this behavior:
No description
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
where workers are started sequentially rather than in parallel scale type is already set to request count of 1, because i want one request to create a new worker how can I make my workers start up in parallel? im limited to about 50 workers starting within 20 seconds. because my workload is bursty, i want to have 75+ workers start up asynchronously within 10-15 seconds
yhlong00000
yhlong00000ā€¢3mo ago
https://docs.runpod.io/serverless/workers/handlers/handler-concurrency Would this work for you? It means a single worker would handle multiple jobs. Iā€™m not sure how heavy each job is, though. Also, which GPU are you using right now?
Concurrent Handlers | RunPod Documentation
RunPod's concurrency functionality enables efficient task handling through asynchronous requests, allowing a single worker to manage multiple tasks concurrently. The concurrency_modifier configures the worker's concurrency level to optimize resource consumption and performance.
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
I am using 4090s. This would not work, by the time a worker is done with one job the window to do the next would have already passed Is there some sort of super high priority flag for a job request that I can send for it to bypass the queue and instantly spin up the worker?
yhlong00000
yhlong00000ā€¢3mo ago
https://docs.runpod.io/serverless/endpoints/send-requests#execution-policies You can set job to low priority, not sure if this helps in your case
Send a request | RunPod Documentation
Learn how to construct a JSON request body to send to your custom endpoint, including optional inputs for webhooks, execution policies, and S3-compatible storage, to optimize job execution and resource management.
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
thats the opposite of what I want to do
flash-singh
flash-singhā€¢3mo ago
so your using request count to scale? and when you need to scale, you submit 50-100 very quick and you need the scale to catch up quick? ill review this logic for requests scale
Encyrption
Encyrptionā€¢3mo ago
They want to spin up 50 workers as quickly as possible to process 50 requests and then shut them all down. They want to insure that each worker processes only 1 request each such that all 50 are run in parallel.
nerdylive
nerdyliveā€¢3mo ago
I think if you used the right scaling type on the endpoint and no concurrent settings on the handler it should do what you want and maybe the Runpod's scaling is abit slow ( not sure )
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
i did not make any adjustment to the concurrency settings. i have my workers scaling at 1 by request count yes this is right, what Im thinking is that the request count scaling would mean that if I have 50 jobs in queue then it would spin up 50 workers at once but from what I observed the current scaling, even with a request count scale of 1, takes time to start up for 50 workers, it takes about 30 seconds
nerdylive
nerdyliveā€¢3mo ago
So the scale up is slow?
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
yea the scale up is slow i want all 50 workers to be started up at the same time, so that 5 seconds after my batch of requests I have 50 workers ready instead of 1 new worker every few seconds is there a way to do it?
nerdylive
nerdyliveā€¢3mo ago
Yeah let's wait for staff here
flash-singh
flash-singhā€¢3mo ago
i plan to review the scale algo and will get back to you
Encyrption
Encyrptionā€¢3mo ago
You could call that algo batch_mode LOL šŸ˜‰
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
thank you :TOCheer:
flash-singh
flash-singhā€¢3mo ago
looks like the scaling algo for request scale is serial, workers are started one after the other, ill plan optimizations for this in the coming week our queue scale method is async, you can try that out and set queue delay scale time to lowest possible, 1 second, that will spin up much faster, please provide some feedback if you do decide to test that out @1AndOnlyPika
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
this odd situation happened:
No description
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
there were requests stitting in queue, but no workers being spun up to handle them the first few workers do start up much faster, but it never reaches 1 request = 1 worker
1AndOnlyPika
1AndOnlyPikaOPā€¢3mo ago
well it does but not evenly, the times end up looking like this
No description
flash-singh
flash-singhā€¢3mo ago
can you pm me endpoint id
1AndOnlyPika
1AndOnlyPikaOPā€¢2mo ago
sent thanks yay looks like the fix was pushed out, thanks runpod!
nerdylive
nerdyliveā€¢2mo ago
Nice, How fast is it now
1AndOnlyPika
1AndOnlyPikaOPā€¢2mo ago
Getting all the workers I need at once started in < 2s if theyre ready
nerdylive
nerdyliveā€¢2mo ago
Ooh
Encyrption
Encyrptionā€¢2mo ago
That's great! Are you using GraphQL?
nerdylive
nerdyliveā€¢2mo ago
Eh graphql what for?
Encyrption
Encyrptionā€¢2mo ago
For starting his 50 workers simultaneously
nerdylive
nerdyliveā€¢2mo ago
Oh, I guess not but let's see
flash-singh
flash-singhā€¢2mo ago
yep pushed the update an hour ago, its for serverless using request scale
1AndOnlyPika
1AndOnlyPikaOPā€¢2mo ago
no just the regular /run endpoint
Want results from more Discord servers?
Add your server