24GB PRO availability in RO
I switched from 24GB tier in RO to 24GB PRO to benefit from the higher availability of the 4090's in RO, but most of my workers are becoming throttled again.
22 Replies
i would mix them, 4090s get relative high spikes
I've never seen that priority thing actually working ever though
Even if all my workers become throttled, it doesn't initialize the 2nd choice, they just stay throttled
what it will do is pick gpu that is available and split between the two based on availability
just wondering, i am trying to make a new endpoint and instantly all 5 workers are throttled before initialization, so i had to add new endpoints and hopefully i can get some unthrottled to just initialize.
but why would it state high avaliability if i just immediately get throttled on initialization? what does high avaliability mean then?
@justin how long are you waiting after setting up an endpoint?
Initial setup does take a decent amount of time. Also, are you using 10+ max workers?
Been a weird situation, ive been launching endpoints but when it hits idle, and i send a request, it just starts downloading again, so ive been deleting it thinking maybe I need to wait for all my docker pushes iterations in the bg to settle down, maybe conflicting hashes are causing redownloads. https://discord.com/channels/912829806415085598/1208257003131113502 Usually i wait for about 10-20 mins in the bg right now, and see if it works, trying to solve a bug right now that is causing my worker to work on gpu pod, but somethign about it crashing on serverless.
And no, im just at 3 max workers, so it spins up 5 potential workers
I dont want to spin up 10+ max workers, cause i dont have enough limits to waste workers like that
But yeah to answer this usually about 10-20 mins, I see if it switched to idle states from an initializing state
@justin Use 10, I give you full permission 😊
can i get an upgrade on worker limits at some point haha, but ok
Personally, I like putting 10, with 1-2 active workers, for the initial setup
i see
why does that change?
is it just to capture
some good gpus to initialize?
Then, send some requests, check if those processed, then if so, remove the active
ah got it good to know
huh
Simply my own opinion of an efficient way of checking a new endpoint, I am far from being an expert though, don't get me wrong haha
What's your endpoint ID?
I can check it out
AH it finally works
nah its all good xD
i just ended up
increasing things
to not just be 4090s
I guess the thing i had before was
i only had it on the 24 GB PRO / 4090 cause it said high avaliability
and i didnt wanna run into like a out of memory
but what fixed it just now for me was just extending the options
Well, even just 24gb pro should work
interesting
But use more max workers, trust me!
ok haha
i guess im just running out of workesr as i deploy more
ðŸ˜
If you activate flash boot, it doesn't work very well for small max workers
but good to know
It gets exponentially better with more workers
Give me ID, I will give you more 😂
our 4090s in eu-ro come in 2x or 3x servers, they fill up easily and cause throttle
8x servers are better but sadly 8x 4090 servers are not easy to come by