R
RunPod14mo ago
blistick

What does "throttled" mean?

My endpoint dashboard sometimes shows "1 Throttled" worker, and 0 other workers, except for queued ones. What does the "throttled" status mean, and how do I prevent the condition?
Solution:
From my understanding, and this is by no way official: Throttled means that other services are using the GPU. I recommend, to have at least 2 max workers (which runpod will then allocate 5 workers on your endpoint), which will have the ability to "potentially" pick up jobs with the maximum workers ever working being the amount you chose. There is no way to prevent it unless you require some "minimum" amount of working to always be active. ...
Jump to solution
9 Replies
Solution
justin
justin14mo ago
From my understanding, and this is by no way official: Throttled means that other services are using the GPU. I recommend, to have at least 2 max workers (which runpod will then allocate 5 workers on your endpoint), which will have the ability to "potentially" pick up jobs with the maximum workers ever working being the amount you chose. There is no way to prevent it unless you require some "minimum" amount of working to always be active. Throttled can also happen if there are issues with runpod itself it seems from my experience. But that is more rare. You can use the /health endpoint to always check your endpoints to make sure you have idle or active workers ready.
justin
justin14mo ago
I was in a state once, where all workers were throttled. and i was very confused - this is quite rare tho but it happens. https://discord.com/channels/912829806415085598/1187367253201657918/1187367253201657918 When I had asked about it to being very confused why everything was throttled^
ashleyk
ashleyk14mo ago
If you are using RO region with network storage, the capacity become greatly reduced since yesterday. All my 24GB workers in RO region are constantly throttled since yesterday.
blistick
blistickOP14mo ago
@justin @ashleyk Thank you both very much for this advice. To summarize it seems I should, (a) have at least 2 max workers, and (b) enable as many regions as possible for my endpoint. (@justin I followed your previous advice about improving worker startup time by NOT using a network drive (which really helped, btw) but I forgot to edit my endpoint to allow more regions.) Constant throttling is rather scary from a production standpoint.
justin
justin14mo ago
It is, unfortunately I don’t know what to do either 🥲 def a guidance that be good 1) I think can have a minimum worker to guarantee, but it costs us up-time even at a 40% discount. I know there is an update graphql endpoint but ive never known what happens if i update it to minimum of 1 worker when all is throttled
blistick
blistickOP14mo ago
Yes, official guidance would be good. Like you, I don't want to incur the cost of always active.
jojje
jojje2w ago
@justin [Not Staff] are the "workers" bound to a specific data center (region) ? If not, then I don't see why adding more workers would help since the situation of the requested GPUs wouldn't change one iota. They'd all just be throttled as well, for the same reason the initial ones were. But if a worker is pegged to a specific colo, then it would make sense as the resource horizon would be limited to that single colo. Do you know which of these hold true for workers? (DC pegged at creation, or whether worker pegging happens once a resource match has been found)
justin
justin2w ago
Yes workers are bound to a specific region. Essentially workers are as if u spun up a pod in a data center, so adding more workers allows runpod to put more workers on standby meeting ur region criteria / gpu criteria, and gets pegged to ur app unless some sort of rotation rule comes in to rotate them out (To my understanding - as a community helper)
jojje
jojje2w ago
I think it's the word choice of "throttled" that causes the confusion, since runpod is hijacking an established term having the defacto meaning (semantic) "mitigation of user induced policy violation" to instead mean "Pending" or "Queued", which means "waiting for the requisite (resource) condition to execute a task". If they'd used any of the latter terms I expect there wouldn't be nearly as many questions about serverless throttling.

Did you find this page helpful?