Automatically Terminate Idle Pods

I want to write a daemon which will automatically terminate my pod if the GPU has sat idle for x amount of hours. Has anyone done something like this before and have code lying around for it? Or could at least point me to the appropriate APIs?
10 Replies
nerdylive
nerdylive2mo ago
the graphql api works for this Well if you want it to detect if pod is sat idle, what will be the trigger?
muddyfootprints.
Cronjob which checks for last time GPU had an operation that ran on it. I imagine nvidia-smi has some easy way to do this
nerdylive
nerdylive2mo ago
Alright then It's good
muddyfootprints.
Checking back on this. So how do I get my current pods id? Looking at graphql and runpodctl's documentation and I'm not seeing anything there
nerdylive
nerdylive2mo ago
Current pods ? like your current pods on your account? https://graphql-spec.runpod.io/#query-myself i think its there
nerdylive
nerdylive2mo ago
No description
justin
justin2mo ago
Runpodctl also has:
runpodctl remove pod $RUNPOD_POD_ID
runpodctl remove pod $RUNPOD_POD_ID
Which you can use this Ive seen some people use nvidia smi? thing? i forgot what it is exactly, but if that is returning the gpu is under some threshold for X amount of time, then run the runpodctl remove pod $RUNPOD_POD_ID which is an env variable on the pod
nerdylive
nerdylive2mo ago
does it needs api key also when run inside the container? yeah it outputs the usage too i think, memory, power, %
justin
justin2mo ago
i dont believe so 🧐 i think it just works from inside the container
nerdylive
nerdylive2mo ago
Woah yeah thats amazing then