urgent: horizontal scaling limited at 5 replicas
Hi! We're on the teams plan. We're trying to scale as fast as we can, but the scaling is limited at 5 replicas. is there anything we can do to get past that?
70 Replies
Project ID:
120b5ec5-59d8-4087-84ae-4e0b3d934aa7
120b5ec5-59d8-4087-84ae-4e0b3d934aa7
now hold on, you have access to 32 vCPU and 32 GB of ram, and you are still hitting those limits?
yes
do you know how expensive the bill is gonna be?
I mean absolutely no offense at all when I say this, but I think you may be running into inefficiencys in your code and are trying to throw compute at the problem
do you work for railway?
i don'
I do not
then that's not useful
okay okay fair
we can add multi threading later but rn i can't exactly rewrite shit
@Angelo - pulling you in
thank you
Indeed you can!
Real quick- why?
What are you hosting?
rizzgpt.app
OH SICK
congrats
ok- jumping on and scaling
wow that is sick, that's come a long way since you first showed it off
tysm
Angelo, I thought the replica limit was increased to 10? was the frontend ui not updated to allow that yet?
Yea, I am confused about that
I just grabbed lunch- going to work through this
were you able to manually scale it?
well- I am more concerned that you can't set it past five
but will knock that out for you too
Railway
404 - Page not found
Railway is an infrastructure platform where you can provision infrastructure, develop with that infrastructure locally, and then deploy to the cloud.
so it seems that you aren't hitting your limits?
ahhh
nvm
L
https://railway.app/project/120b5ec5-59d8-4087-84ae-4e0b3d934aa7/service/d89b8d57-7c05-4bab-bf8a-27bc11f78cbc/metrics
this right?
tagging to confirm @theodor
So looking at your logs, you seem to be processing a lot of requests that don't need to be? It seems that you are hitting /refresh a crazy amount of times when it shouldnt
anyway, I digress
@Angelo that's right!
sorry
i just fixed a bug that should make things much better
we may be able to stick with 5
/refresh token you mean?
maybe ther'es something funky we're doing
yea
either way!
its bug on our end
we are fixing the cap
tysm'
but rn it's 10?
deploying new fix
should be 20
ok
we can ask to reduce it later once we don't need it
gotta sclae
yeah
you can lower the number
(ideally this would be based on load)
yeah
thanks so much for helping us here btw!
np! hit us up in this thread if you run into some challenges
(also if you shout us out I will retweet hehe)
thank you!
oh yeah definitely
let me do it
so far so good
we also gixed a bug on our end
@Angelo Can you bump us to 15? We're soon going to deploy some parallelization changes but it seems things are creeping
is it appropriate to say, suffering from success?
a little bit haha
the UI should be updated, can you type in 15 replicas and see if that works?
let me check! thanks
it worked! redeploying
@Angelo @Brody 🙏 QQ - we're trying to add multiprocessing, but we need to use a Dockerfile. How does the port allocation work in this case for having replicas since I know Railway injected PORT during buildtime
Or I guess a better question is - how does the replicas work behind the scenes?
you dont need to do anything different, that part works the exact same
its just your service being duplicated your chosen amount, and then an incoming request is proxied to one of the services at a time, with (i think) round robin
are the replicas in separate "physical" instances? like do they have independent ports
since the PORT variable is auto generated, they would have diffent ports, yes
If you define a PORT variable they should(?) have the same port
even if you set a specfic PORT in the service variables, it doesn't make a difference, it would work the same
if your next question is "can the replicas share data between each other" the answer is no, not natively
haha that wasn't the question I had but one sec, need to look into something
Do you assign some sort of unique identifier that the replica would know? Like a REPLICA_ID env variable that's unique to the replica
There's a workaround but just wanted to ask
indeed
RAILWAY_REPLICA_ID
sick
very sick
same port, but we proxy it on your behalf, we are going to open up the internal networking stuff we do on your behalf to make this more custom
cough private networks cough
I see I see
So I tried increasing the number of uvicorn workers and the vCPU is looking mad crazy - would love to understand why đź‘€
ah nevermind, it just came down. Seems to be that it was lagging
what have you increased workers to?
you should be able to go up to 65 with 32 vCPUs
Just trying to gauge whether increasing workers on gunicorn is better than increasing replicas on Railway's side
Right now, I've configured it to be 3 replicas and 4 workers
depends, what do your cpu metrics look like
After 1:30pm is 2 replicas and 4 workers - before 1:00pm is just 7 replicas
I think it makes sense and I was just confused. Will let you know if I see any other issues
fyi this is the script we're using on start
service datadog-agent start && python manage.py migrate && ddtrace-run gunicorn backend.asgi:application -k uvicorn.workers.UvicornWorker --workers=4
Actually, on a second look, it does seem like the gunicorn workers are costing significantly more vCPUs - with only replicas (7 replicas), it seems like vCPU usage was 6~8, but with gunicorn it does seem a lot spikier
ah growing pains
@Angelo so underneath the hood, is the replica more like spinning up k8 pods? Because if that’s the case we shouldn’t use any workers on gunicorn
From the speed in which the replicas get spun up I would assume that this is the case
I'm pretty sure railway just builds the image once, runs the image your chosen amount of times, then load balances incoming requests out to the replica set
yep- nothing crazy, just num containers, kinda like how docker compose would do it
hmm I wonder why its like that?
https://twitter.com/theomarcu/status/1664319413854650379?s=20
Here's a tweet giving you a shoutout!
Theodor Marcu (@theomarcu)
Honestly we wouldn't have been able to sustain 10x traffic to @RizzGPT_ over the past few days without the help of the @Railway team
Twitter
Thanks again for the help @Angelo and @Brody
Things are more manageable now!
wait- were you at Priceton reunions?
haha not this year!
i wish
I didn't go there but went and it was insane
haha that's awesome
hope you had a lot of fun!
I did!
ty for the shoutout (David our twtter guy) appreciates it!
haha anytime! we also appreciate the RT
and glad you did! we were sadly too swamped with all the craziness to be there
I get that