Automatically Terminate Idle Pods
I want to write a daemon which will automatically terminate my pod if the GPU has sat idle for x amount of hours.
Has anyone done something like this before and have code lying around for it? Or could at least point me to the appropriate APIs?
10 Replies
the graphql api works for this
Well if you want it to detect if pod is sat idle, what will be the trigger?
Cronjob which checks for last time GPU had an operation that ran on it. I imagine nvidia-smi has some easy way to do this
Alright then
It's good
Checking back on this. So how do I get my current pods id?
Looking at graphql and runpodctl's documentation and I'm not seeing anything there
Current pods ? like your current pods on your account?
https://graphql-spec.runpod.io/#query-myself
i think its there
Runpodctl also has:
Which you can use this
Ive seen some people use nvidia smi? thing? i forgot what it is exactly, but if that is returning the gpu is under some threshold for X amount of time, then run the runpodctl remove pod $RUNPOD_POD_ID which is an env variable on the pod
does it needs api key also when run inside the container?
yeah it outputs the usage too i think, memory, power, %
i dont believe so 🧐 i think it just works from inside the container
Woah yeah thats amazing then