How to Estimate the Survival Time of Spot Instances?
I need some advice on estimating the survival time of RunPod Spot instances. I've noticed that sometimes my Spot instances run for several hours without interruption, while other times they get terminated within minutes. This variability makes it challenging to choose between SPOT and ON-DEMAND.
9 Replies
No estimate of survival time, demand is unpredictable from other customers sorry..
yes, cause if you're using spot, you need to like handle the exit termination signals on the linux
then move to other pod
etc... to keep your work running smoother
or use something like skypilot, or some other library-> orchestration for cloud gpus to make that easier
Yeah. Agreed that demand from other users is unpredictable. I was hoping that there are some statistic algorithms which predict the survial time.
skypilot will be a great tool if I am going to run some batch jobs. thanks for the suggestion. meanwhile sometimes I use runpod machine as my workstation as well.
OhI think there is no, for now unless you collect it yourself
like by time of day, date
hah. yeah. that might be an solution.
I really love network volumn provided by runpod. It makes using RunPod as a daliy workstation possible. Usually I run a pod for about several hours. If I can select a feasible SPOT price which make a RunPod survive for about 2 hours in average, then it will be perfect.
Can I at least receive a signal inside the container when the SPOT instance being killed and allow me one second to log necessary states to the volume?
Spot Pods use spare compute capacity, allowing you to bid for those compute resources. Resources are dedicated to your Pod, but someone else can bid higher or start an On-Demand Pod that will stop your Pod. When this happens, your Pod is given a signal to stop 5 seconds prior with SIGTERM, and eventually, the kill signal SIGKILL after 5 seconds. You can use volumes to save any data to the disk in that 5s period or push data to the cloud periodically.
https://docs.runpod.io/references/faq/#on-demand-vs-spot-pod
FAQ | RunPod Documentation
RunPod offers two cloud computing services: Secure Cloud and Community Cloud. Secure Cloud provides high-reliability, while Community Cloud offers peer-to-peer GPU computing. On-Demand Pods run continuously, while Spot Pods use spare compute capacity.
try on demand if you don't want the termination like the spot haha
its not really enough to save files, most likely takes time more than that
Time is money 😆
😉 And I am planning to use the 5 seconds to log the status so that I can resume on another machine without pain.
yeah sure that might work