Need Help with Auto GPU Shutdown & Startup
Hey everyone!
I'm new to RunPod and exploring it for my AI workloads. I’d like to optimize GPU usage by setting up a system where the GPU server automatically shuts down when idle and starts up again only when a job or request needs it.
Is there a clean way to achieve this using RunPod's features—like APIs, webhooks, or serverless functions?
Also, I'd really appreciate if anyone could share the pros and cons of implementing such a setup (like boot time delays, possible missed requests, or cost-saving benefits).
Thanks a lot in advance! Excited to be part of this community.
1 Reply
Hey! This sounds like our serverless product, you give us the image to start and when your endpoint receives a request we start the image, load the model, etc and handle the request for you.
This method is a cheaper than running a permanent pod at the expense of a kind of annoying startup time related to loading the models into VRAM to handle your first request. For apps with not a lot of traffic, it'll just be a little slow. You can modify things like how many "Active Workers" you want running (to remedy the startup time problem) and how long a worker will stay alive (in seconds) after it handles a request. It's not possible for serverless to miss requests, we assign each request an id and handle the lifecycle related to finishing a singlular job id for you.