not enough GPUs free
Hi there,
wish you a good day today. I have a serverless endpoint running on runpod, it is created on top of the network storage belongs to US-OR-1 data center. it was running well for somedays, but 20 mins before, I have encountered the issue that no worker is able to be created because no GPU resource. the system throws a log like this repeatedly.
2024-07-13T06:32:22Z create container USERNAME/ENDPOINT
2024-07-13T06:32:22Z error creating container: not enough GPUs free
how can I make sure there are GPU resources whenever the request comes, should I change the endpoint and the network volume to other region which has more GPU resoures? how often this shortage will be happening. it post a risk on the stability and quality of service which is critical in most scenarios.
thank you.
22 Replies
Try to report it to runpod from contact
but for now, try to change your env variable ( add dummy any )
or update your template image to other tag /version
I did not get it. the image version is my customized version on dockerhub, it might be V1, V2, V3 any thing, how does it related to the GPU resource competing? and which env file should I edit, to add the dummy any? how. thank you so much.
add a dummy env variable from edit template
it re makes worker
when you do that
any thing?
like ENVDUMMY=anything
add an env, anything : anything
yeas
then save
it'll redeploy
ok, so you suggest it is not actually a resource lackage, it is a bug
that is why I should add the dummy env and or update image version
yea probably\
for redeploying
ok, thx
Your welcome
@Monster Did that work?
I have added a dummy env. it did work, but it does not mean the problem solved, since the error gone after 2 or 3 mins by itself, after couple of times trying new worker. so, any idea or suggestion how to make it not happening again? thx
I think its more like from their internal bug
so i think the best way is to report it to runpod via support ticket on the website so they can manage to fix it
Try submitting one if you havent 👍
yes, I did report. thank you so much for the support.
no problem im happy its solved for now
alright great lets just wait for their response
"they" what do you mean "they", I thought you are from runpod support team, isn't you
No im not an official support team member
so you are hired by them or you are the volunteer to give support based on your experience.
Not both, but im invited as an community helper here
I can't really access runpod's internal systems, nor your resources
Whats up? why are you asking these hahah
I feel you are very confident and familar with all sort of issue, platforms, technologies. at least would be a senior member of their support team. so it supprised me that you do not have access to runpod internal
Yes, i've been here quite long and also an active developer ( not at runpod ) 😆
i hope someday maybe i can join Runpod
yes, you will for sure. thx anyway.