Support for terminating pods via SkyPilot

Hi, I want to let my training runs go overnight and to terminate the pod once they are finished training. To do this, I am currently using SkyPilot. Whenever I try and stop a pod via SkyPilot, I get an error similar to Stopping is currently not supported for RunPod. Can RunPod please support this feature?
11 Replies
shishito pepprito
shishito peppritoOP5mo ago
It would also be useful to be able to set image_id so I can use the template runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 instead of the default which has an old version of cuda CC @Luke
yhlong00000
yhlong000005mo ago
If you’re using a network volume, there’s no need for a “stop” option since all your data is stored in the network volume. You can safely terminate the pod without losing data. Regarding your second question, I didn’t quite follow. When modifying the template, you can specify any docker image you prefer.
shishito pepprito
shishito peppritoOP5mo ago
I am trying to terminate the pod via CLI using the SkyPilot integration, but I get an error that its not supported. Same for the template, I want to set it via CLI using SkyPilot, but get an error that its not supported.
shishito pepprito
shishito peppritoOP5mo ago
I am trying to build off of this tutorial, using the features in SkyPilot: https://docs.runpod.io/tutorials/integrations/skypilot
Running RunPod on SkyPilot | RunPod Documentation
SkyPilot is a framework for executing LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
nerdylive
nerdylive5mo ago
is this solved yet?
shishito pepprito
shishito peppritoOP5mo ago
it is not
nerdylive
nerdylive5mo ago
i wonder how your yml filelooks like (skypilot)
shishito pepprito
shishito peppritoOP5mo ago
I can post it later today, the main thing that differs is I specify ‘image_id’ to try and get a torch 2.4 template, but it says its not supported with runpod working backwards, is there any docs on specifying a template on skypilot with runpod? Is there any way to auto terminate a pod when its idle (ie training run ends)?
nerdylive
nerdylive5mo ago
I think on skypilot docs? (not sure, I haven't checked )
shishito pepprito
shishito peppritoOP5mo ago
Thats what I used :p I just dont think runpod supports these features in the integration
nerdylive
nerdylive5mo ago
Yeah maybe, it hasn't been added to skypilot yet

Did you find this page helpful?