RunPod•10mo ago

not enough GPUs free

Hi there, wish you a good day today. I have a serverless endpoint running on runpod, it is created on top of the network storage belongs to US-OR-1 data center. it was running well for somedays, but 20 mins before, I have encountered the issue that no worker is able to be created because no GPU resource. the system throws a log like this repeatedly. 2024-07-13T06:32:22Z create container USERNAME/ENDPOINT 2024-07-13T06:32:22Z error creating container: not enough GPUs free how can I make sure there are GPU resources whenever the request comes, should I change the endpoint and the network volume to other region which has more GPU resoures? how often this shortage will be happening. it post a risk on the stability and quality of service which is critical in most scenarios. thank you.

22 Replies

Jason•10mo ago

Try to report it to runpod from contact but for now, try to change your env variable ( add dummy any ) or update your template image to other tag /version

MonsterOP•10mo ago

I did not get it. the image version is my customized version on dockerhub, it might be V1, V2, V3 any thing, how does it related to the GPU resource competing? and which env file should I edit, to add the dummy any? how. thank you so much.

Jason•10mo ago

add a dummy env variable from edit template

Jason•10mo ago

Jason•10mo ago

it re makes worker when you do that

MonsterOP•10mo ago

any thing? like ENVDUMMY=anything

Jason•10mo ago

add an env, anything : anything yeas then save it'll redeploy

MonsterOP•10mo ago

ok, so you suggest it is not actually a resource lackage, it is a bug that is why I should add the dummy env and or update image version

Jason•10mo ago

yea probably\ for redeploying

MonsterOP•10mo ago

ok, thx

Jason•10mo ago

Your welcome @Monster Did that work?

MonsterOP•10mo ago

I have added a dummy env. it did work, but it does not mean the problem solved, since the error gone after 2 or 3 mins by itself, after couple of times trying new worker. so, any idea or suggestion how to make it not happening again? thx

Jason•10mo ago

I think its more like from their internal bug so i think the best way is to report it to runpod via support ticket on the website so they can manage to fix it Try submitting one if you havent 👍

MonsterOP•10mo ago

yes, I did report. thank you so much for the support.

Jason•10mo ago

no problem im happy its solved for now alright great lets just wait for their response

MonsterOP•10mo ago

"they" what do you mean "they", I thought you are from runpod support team, isn't you

Jason•10mo ago

No im not an official support team member

MonsterOP•10mo ago

so you are hired by them or you are the volunteer to give support based on your experience.

Jason•10mo ago

Not both, but im invited as an community helper here I can't really access runpod's internal systems, nor your resources Whats up? why are you asking these hahah

MonsterOP•10mo ago

I feel you are very confident and familar with all sort of issue, platforms, technologies. at least would be a senior member of their support team. so it supprised me that you do not have access to runpod internal

Jason•10mo ago

Yes, i've been here quite long and also an active developer ( not at runpod ) 😆 i hope someday maybe i can join Runpod

MonsterOP•10mo ago

yes, you will for sure. thx anyway.

Gaming

Programming

not enough GPUs free

Did you find this page helpful?