Need Help with Auto GPU Shutdown & Startup

Hey everyone! I'm new to RunPod and exploring it for my AI workloads. I’d like to optimize GPU usage by setting up a system where the GPU server automatically shuts down when idle and starts up again only when a job or request needs it. Is there a clean way to achieve this using RunPod's features—like APIs, webhooks, or serverless functions? Also, I'd really appreciate if anyone could share the pros and cons of implementing such a setup (like boot time delays, possible missed requests, or cost-saving benefits). Thanks a lot in advance! Excited to be part of this community.
1 Reply
Dj
Dj7d ago
Hey! This sounds like our serverless product, you give us the image to start and when your endpoint receives a request we start the image, load the model, etc and handle the request for you. This method is a cheaper than running a permanent pod at the expense of a kind of annoying startup time related to loading the models into VRAM to handle your first request. For apps with not a lot of traffic, it'll just be a little slow. You can modify things like how many "Active Workers" you want running (to remedy the startup time problem) and how long a worker will stay alive (in seconds) after it handles a request. It's not possible for serverless to miss requests, we assign each request an id and handle the lifecycle related to finishing a singlular job id for you.

Did you find this page helpful?