All of the workers throttled even if it shows medium availability?
When we created an endpoint in a serverless manner, we noticed that none of our queries were being processed. When we looked inside the endpoint, we saw that all the workers were throttled. However, these machines appear to be available in terms of their availability status, how can we solve this?
8 Replies
It has been waiting more than 30 minutes like that?
@Deniz This can happen if a company just eats up all gpus
due to huge spikes in demand
something runpod is working on
btw im just a community member lol just i had also asked this
the way i handle it is i set a minimum worker if im in this situation and use a /health endpoint to see
if i was able to steal back the gpu
i havent written code to dynamically do this yet but they have a runpod graphql where u can update ur endpoints live
have you limited this to a DC / network volume?
@flash-singh this endpoint is not limited to any network volume
I see that its better now, this is something we have to get better at moving workers when all get throttled at once.
This is very common situation right now!
Do we have any workaround for this throttling issue_
16GB GPU tier is low availability, probably better to switch to 24GB tier