Bad pods on Serverless
I see that about 20-30% workers that are spawned fail with this error
error starting: Error response from daemon:
error starting: Error response from daemon: No such container
Can you please look into this? serverless endpoint id: 1busjme5syomep
13 Replies
It looks like you’ve reached your spending limit, and I’ve also noticed that sometimes the GPU you rented isn’t available. I’ve adjusted your spend limit, let me know if things improve.
Thank you. Can you please check nkckkkj01omcv5 as well?
I will monitor for a bit and let you know
this one also show some not enough gpus error, but I also see bunch of errors from your code script3.py, it fails many time, you might want to check it.
Yes, I am looking into the errors. I had another question about using NFS mounts. I am invoking Serverless API from a remote server and my expectation is to share use the same file system. Is that something doable?
Also, is there a way to cancel a job if the delay time is > x seconds?
Mounting your own file server to the serverless isn’t supported. The delay time is because the request is sitting in the queue and hasn’t been processed yet. Are you looking to cancel jobs that stay in the queue for more than ‘x’ seconds?
Thanks. Yes, that’s correct. Anything sitting in the queue for more than 30 seconds should not be processed.
so far we don't have this feature, but one workaround is you can check the status of the job, it will give you delayTime, which is how long request are sitting in the queue, if you found it is more than 'x' seconds, you can send a cancel request.
Thanks
Is there any way to access runpod's network volume outside runpod? I mean without need to rent the cheapest gpu(on that DC) and use that to access the network volume.
No
You could use cpu, it is much cheaper
I thought network volumes on CPU was currently disabled?
Yeah, I think cpu serverless you can still use network volume, technical you can do it, but maybe too complicated 🥲