RunPod•15mo ago

All 27 workers throttled

Our company needs stable aviability of minimum 10 workers. Quite recently the biggest part or even all workers are throttled. We arleady spent more than 800-1000$ on you service and would be pretty grateful whether there will be some stable amount of requested workers. IDS: 6lxilvs3rj0fl7, 97atmaayuoyhls. Our customers have to wait for hours...

153 Replies

ashleyk•15mo ago

Does your endpoint use network storage in RO region?

JidovenokOP•15mo ago

Network is in EU-CZ-1

JidovenokOP•15mo ago

Our company would be very grateful for the solution. The availability tends to stay the same for last few days. Due to huge waiting time we are losing money 😦 We were thinking of slowly increasing the amount up to 30+, but now we can't even have 5 stable working workers 😦

ashleyk•15mo ago

Yeah looks like its basically a no-go in that region, you may want to consider setting up a new endpoint in either EU-SE-1 or EU-NO-1 regions. I had this same issue with EU-RO-1 and had to create a new endpoint.

JidovenokOP•15mo ago

The thing is that the network itself doesnt allow other regions, even if i deploy it to any location

ashleyk•15mo ago

Yeah I created a new network volume as well. Its very inconvenient but better than having down time and losing money.

J.•15mo ago

https://discord.com/channels/912829806415085598/1194711850223415348 Can refer to this to how to copy data over in case downloading it from some other source not an option https://discord.com/channels/912829806415085598/1209602115262095420 Also was something we gave as a feedback to @flash-singh . Sadly the fact that serverless workers can get fully throttled across the board on a region i find frustrating / insane too

ashleyk•15mo ago

Yeah it shouldn't happen that every single worker becomes throttled and brings down our production applications.

JidovenokOP•15mo ago

How often does this problem happen? We recently moved to serverless instead of gpu cloud, but the expirience is quite sad by far

J.•15mo ago

Just wondering, how big are your models?

JidovenokOP•15mo ago

about 3gb, one model

ashleyk•15mo ago

Happens A LOT. Happened to me at least 3 or 4 times in the last 6 months.

JidovenokOP•15mo ago

probably even smaller

J.•15mo ago

I think for the 4090s the 24gb Pro, it happens a decent amount. I try to avoid it and go 24gb + 48gb gpu. Also if ur only 3gb build it into the image instead Ull get way way more flexibility and less of this issue to where i dont have problems with those endpoints with 10+ workes anything that is < 35gb I build into my model if it doesnt need dynamic switching

JidovenokOP•15mo ago

Already using 24 + 24 pro. Where can i find more info about this method?

ashleyk•15mo ago

All 24GB PRO in RO are gone , thats why all my workers in RO are throttled, in a matter of WEEKS, it went from high availbility for 4090 to nothing and all my workers throttled

JidovenokOP•15mo ago

And how long does it take to be resolved in average?

J.•15mo ago

When you select, select 1 on the 48pros, and 2 as the 24gb. Also, if you build the image into the model, and get off network storage, ull be able to use all data centers not just ones tied to network volume

ashleyk•15mo ago

Weeks, months, I move to a new endpoint

J.•15mo ago

I saw someone recently @kopyl who was throttled for an hour. so i suggest in ur situation, move to building the model into the image, and shouldnt be an issue

ashleyk•15mo ago

48GB PRO is low availability, I don't recommand

JidovenokOP•15mo ago

The thing is i am using automatic1111 + custom model + LORAs

ashleyk•15mo ago

Same here

J.•15mo ago

Im just sharing what i have, i get high on 16gb, and 48pro at least for me with no network region

J.•15mo ago

dockerhub lets u have one private repo that's what i do for my private stuff unless u have more stuff It always the 4090s that bottleneck me

ashleyk•15mo ago

WTF shows LOW for me without a network volume

JidovenokOP•15mo ago

So you manually push volumes to dockerhub and build from image directly?

J.•15mo ago

u could be right ashelyk, just found out im throttled across the board

J.•15mo ago

No not push volumes to dockerhub U can just do some function call in ur dockerfile to download the model

ashleyk•15mo ago

Maybe became medium availability for a brief moment, workers are constantly moving around

J.•15mo ago

https://github.com/justinwlin/Runpod-OpenLLM-Pod-and-Serverless/blob/main/Dockerfile here is an example

JidovenokOP•15mo ago

this is so frustrating)))

JidovenokOP•15mo ago

ok i see wym Thank you!

J.•15mo ago

yea i asked flash about this before, and its b/c someone can just eat up all the gpus for their super big clients. Something im debating on is if i get fully throttled across the board, i use their graphql endpoint to set a minimum of 2 active workers to steal back workers

J.•15mo ago

https://github.com/justinwlin/runpod-api

GitHub

GitHub - justinwlin/runpod-api: A collection of Python scripts for...

A collection of Python scripts for calling the RunPod GraphQL API - justinwlin/runpod-api

J.•15mo ago

@ashleyk got a repo on that It isnt an instant switch but better than getting fully throttled it seems to respect minimum workers and prioritze it

JidovenokOP•15mo ago

And i will be able to use all data centers? The problem will be resolved or they still have this one sometimes even on the bigger amount of data centers?

J.•15mo ago

Ull be able to use all data centers and not locked to a region I think the problem will happen more rarely, @flash-singh supposedly has said if a worker is throttled for an hour, it terminates and switches it out, but that is crazy to me, why it would allow us to fall into an all worker throttle situation; also im not sure that really happens to be honest so i recommend maybe to explore the minimum worker force scenario, b/c i ping the /health on my endpoint routinely

J.•15mo ago

an ex of me pulling a minimum of 2 workers now to forcefully get my workers back

J.•15mo ago

maybe make ur numbers look like this

J.•15mo ago

4090s are always eaten up, so should prob be the #3 or whatever the lowest number is tbh idk what the numbers even do 🤷🤷🤷 which i complained about too

Gaming

Programming

All 27 workers throttled

Did you find this page helpful?