R
RunPod5mo ago
Riley

Start and stop multiple pods

I have a product that will allow users to submit video editing requests that can range anywhere from 0-8 minutes of RTX 4090 GPU processing each to complete. To manage the multiple requests, I wanted to implement a system that turns on and off a group of GPUs all running the same docker image. This way if requests are high at a given time they could all still be handled. However in my experience, when pods are stopped, it can be the case that the GPU attached to it is no longer available when I attempt to restart it at a later point. This would obviously be a problem because if a GPU is no longer available when a request requires the pod to turn back on, the request would not be able to process correctly. Is there any way to go around this issue of GPUs being unassigned so I can turn pods on and off and make this system feasible? I saw the serverless option which seems like it would work for this product but the cost does not seem feasible. Thank you!
5 Replies
ashleyk
ashleyk5mo ago
Serverless is what you want, GPU cloud is not suited to this. What is the issue with the Serverless cost?
Riley
Riley5mo ago
The problem we see is that to use serverless (since we need at least one worker always active because the time from when we deploy to the time it is ready to accept requests is around 15 minutes) we would need to pay 0.00026/s or .936/hr all the time and then add on the cost of more flex workers if necessary due to high volume which come at a cost of 0.00044/s or 1.54/hour. So it seems it would be prefererable to manage pods in GPU cloud paying 0.74/hr when active and 0.006/hr when inactive. This is all assuming RTX 4090 costs
justin
justin5mo ago
I think this is probably wrong cause to start and stop a pod is not instant also It takes time vs if u have such loads flashboots will help u speed up in a prod env since its runpod’s caching in the bg
justin
justin5mo ago
I mean if u have like let’s say: 1) a way to know a user may submit a video u can prewarm the worker ahead of time by setting a minimum active worker to one dynamically, rather than paying for one 24/7 They have graphql mutations for this maybe ashelyk had a wrapper idk https://github.com/ashleykleynhans/runpod-api 2) Or just send an empty request ahead of time to turn on 3) If the video is not using up all the gpu u can use concurrency to have one worker accept multiple jobs also means ur parallelizing some requests 3) When a worker becomes active will be able to accept requests without preloading whatever else into memory if the variables are defined outside the handler.py so those are only loaded once
GitHub
GitHub - ashleykleynhans/runpod-api: A collection of Python scripts...
A collection of Python scripts for calling the RunPod GraphQL API - ashleykleynhans/runpod-api
justin
justin5mo ago
But if u rlly wanna do this maybe the github repo above^ has ways or at least a lead that u could wrap the runpod graphql mutations for this purpose but i think serverless is perfectly suited for what u want