R
RunPod3mo ago
jimbo

Support for terminating pods via SkyPilot

Hi, I want to let my training runs go overnight and to terminate the pod once they are finished training. To do this, I am currently using SkyPilot. Whenever I try and stop a pod via SkyPilot, I get an error similar to Stopping is currently not supported for RunPod. Can RunPod please support this feature?
11 Replies
jimbo
jimboOP3mo ago
It would also be useful to be able to set image_id so I can use the template runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04 instead of the default which has an old version of cuda CC @Luke
yhlong00000
yhlong000003mo ago
If you’re using a network volume, there’s no need for a “stop” option since all your data is stored in the network volume. You can safely terminate the pod without losing data. Regarding your second question, I didn’t quite follow. When modifying the template, you can specify any docker image you prefer.
jimbo
jimboOP3mo ago
I am trying to terminate the pod via CLI using the SkyPilot integration, but I get an error that its not supported. Same for the template, I want to set it via CLI using SkyPilot, but get an error that its not supported.
jimbo
jimboOP3mo ago
I am trying to build off of this tutorial, using the features in SkyPilot: https://docs.runpod.io/tutorials/integrations/skypilot
Running RunPod on SkyPilot | RunPod Documentation
SkyPilot is a framework for executing LLMs, AI, and batch jobs on any cloud, offering maximum cost savings, highest GPU availability, and managed execution.
nerdylive
nerdylive3mo ago
is this solved yet?
jimbo
jimboOP3mo ago
it is not
nerdylive
nerdylive3mo ago
i wonder how your yml filelooks like (skypilot)
jimbo
jimboOP3mo ago
I can post it later today, the main thing that differs is I specify ‘image_id’ to try and get a torch 2.4 template, but it says its not supported with runpod working backwards, is there any docs on specifying a template on skypilot with runpod? Is there any way to auto terminate a pod when its idle (ie training run ends)?
nerdylive
nerdylive3mo ago
I think on skypilot docs? (not sure, I haven't checked )
jimbo
jimboOP3mo ago
Thats what I used :p I just dont think runpod supports these features in the integration
nerdylive
nerdylive3mo ago
Yeah maybe, it hasn't been added to skypilot yet
Want results from more Discord servers?
Add your server