Railway•6mo ago

Should we expect an increase in bandwidth usage with V2 runtime?

Recently (amongst many other changes..!) swapped our API gateway to V2. And we saw a large increase in bandwidth. Is this expected? Swapping back to legacy "fixed" it. I can investigate further and try to produce a "clean" repro (turn on V2, redeploy, run for a few hours, and revert) if this is not expected behaviour. My other services showed similar bandwidth increases, however the data there is much noisier due to all the other changes I was making at the time. Project ID 4c3b4b0e-006a-407e-90c7-9c3031cd622f The image shows the window of time we had V2 runtime enabled

57 Replies

Percy•6mo ago

Project ID: 4c3b4b0e-006a-407e-90c7-9c3031cd622f

Brody•6mo ago

I would very much appreciate if you could come up with a clean way to reproduce this

ProdiggaOP•6mo ago

im running a test today, will get back with result. i've just swapped our gateway over to v2 again. ill run it for a bit and show a comparison

Brody•6mo ago

im not too sure if a in use app is a very good reproducible example

ProdiggaOP•6mo ago

i dont have time to put in any more effort, unfortunately. I am the only programmer at my studio and i am spread too thin. thats why we use platforms like railway! so ALL I changed was the runtime to V2 on our app. nothing else was changed. I clicked V2, got prompted to deploy the changes, I hit ok and thats all

ProdiggaOP•6mo ago

our bandwidth use just doubles when we enable v2 runtime, along with estiamte bill for the month etc i am now reverting the change ($$$$) and can report back if and when the bandwidth drops back in line project id is 4c3b4b0e-006a-407e-90c7-9c3031cd622f and the service in question is 3545427b-d98c-42ec-b5ac-f9cc4326e3c4 if any railway dev wants to poke around and investigate i guess its more than double..! almost triple 🙂

Brody•6mo ago

i created an example project, with 3 services, 2 services to download a file on a loop with a fixed download size and download speed, and the other service to serve the file, one of the download services used the legacy runtime, and the other used the v2 runtime. i am unable to reproduce, in fact the v2 runtime uses a tiny bit less network

ProdiggaOP•6mo ago

thanks for trying to reproduce it!

Brody•6mo ago

ignore the large bumps, i was dialing in the settings as to not rack up a massive bill

ProdiggaOP•6mo ago

i dont know what it might be. but from what i understand legacy will eventually be disabled and we will be pushed onto v2. and v2 is supposed to 'just work' with no changes right? Its not something we need to concern our selves with? nice yeah, do you have any idea what it might be? perhaps regions are involved? we host on US East

Brody•6mo ago

i dont think its just the v2 runtime, with your app there are many other factors at play

ProdiggaOP•6mo ago

maybe private traffic is being counted incorrectly in v2 runtime when your region is not the default though again, with this one change, it will more than triple our bandwidth bill , and if its something thats just supposed to work then i think its something that railway may want to investigate before pushing it to their users?

Brody•6mo ago

private traffic shouldnt be counted at all, regardless of region

ProdiggaOP•6mo ago

i know

Brody•6mo ago

but good idea well v2 is the default for all new services

ProdiggaOP•6mo ago

yeah I noticed - i recently split the responsibilites a DIFFERENT service (in the same project) in 2. the service was doing 2 jobs at once, essentially. A rest API and a socketio/realtime comms service (chat, etc). I basically added a switch to make the service act as one or the other, because i wanted to get a good idea how much of our bandiwdth bill was coming from the socketio/realtime stuff vs rest api external database queries so anyway, that service was using LEGACY (its been around for a while) i split it into two, made the existing service into Rest API only, and the NEW service I made into the socketio/realtime service.. the NEW service was automatically v2 runtime bandwidth usage was HUGE again, like a 3x jump in normal usage i eventually figured out the v2 switch was '''''to blame''''' set it to Legacy and now old service + new service bandwidth = old combined service bnadwidth, as expeted

Brody•6mo ago

didnt you say that websocket connections failed on the v2 runtime, or was that the edge proxy?

ProdiggaOP•6mo ago

no the websocket connections failed in 'edge proxy' (Though I may have messed up my words, sorry - i was knee deep in a bunch of problems when i was debugging all that, as you can tell)

Brody•6mo ago

i ran my test with the edge proxy on, im going to disable that and try again

ProdiggaOP•6mo ago

the gateway has edge proxy enabled! (the one from the test today) my current config is: Gateway: Edge Proxy ON, Runtime: Legacy Rest Api: Edge Proxy ON, Runtime: Legacy SocketIO: Edge Proxy OFF, Runtime: Legacy

Brody•6mo ago

what service are these graphs from?

ProdiggaOP•6mo ago

the graphs are from API Gateway here is the moments before i SPLIT my restapi+socket io service into TWO the other day

ProdiggaOP•6mo ago

the purple lines are Socketio/rest api services (you can see where I split it into two (two purple lines) and enabled v2 runtime

Brody•6mo ago

did you ever get any errors from the socketio service when you said the edge proxy wasnt working for you?

ProdiggaOP•6mo ago

and the YELLOW is my api gateway where i ALSO enabled v2 runtiem at the same time then swiftly revereted v2 -> legacy, and you can see my traffic back to expected levels -- where API gateway (yellow) looks usual, and the two purple lines 'add up' to approx what the traffic was before the split i have not investigated that yet, i am not sure when i will have a chance to at the moment, i will open a separate help thread for that if i can confirm that Edge Proxy -> ON just breaks my Socket IO functionality just trying to think of what else is 'strange' about my setup, but, mm, the region being different is the only thing i can think of that isn't "stock standard". the nodejs app is just a nestjs app. especially the api gateway one is VERY straightforward and simple. it just proxies requests to one of 3 (dev/stg/prd) servers (using internal url) based on some headers in the request. it exposes a health check endpoint. it also exposes an endpoint to query info about the 3 servers. finally, it has a redis client connection to receive updates about changes to those 3 servers (rare occurance. once a week or two when I push out an update to the game)

ProdiggaOP•6mo ago

API gateway, just now, 20 mins after reverting back to Legacy:

Brody•6mo ago

and just to be clear, the service works just fine on the v2 rutime right?

ProdiggaOP•6mo ago

yeah! OH also just remembered we also have an OTEL collector that my server is reporting its data too (again, internal) so the start command for the api gateway is actually node --require '@opentelemetry/auto-instrumentations-node/register' dist/apps/ssr-api-gateway/main to do all the opentel auto instrument stuff i can test with it disabled + v2 maybe

Brody•6mo ago

are you sure there arent errors anywhere, and something is going into a retry loop and bloating the bandwidth?

ProdiggaOP•6mo ago

as a side node, for the window of time we were running on V2 runtime, the API gateways response time was great and so stable 😛

ProdiggaOP•6mo ago

possible? I can never discount it i guess? but it would have to be in response to a client request. no one else pokes this server. just requests from clients in the game. and the request is then proxied and the response sent back but logs are clean, and I DO get errors when proxies fail in other cases

Brody•6mo ago

then that rules that out

ProdiggaOP•6mo ago

just double checking logs yeah nothing suspecious no errors in the last 5 hours, except for when i reverted back to legacy and redeployed 😛 ill try v2 runtime w/o otel instrumentation, just in case :3HC_Shrug:

Brody•6mo ago

hypothetically, what would happen if railway isnt able to determine the cause of your increased network? (its still the weekend, i cant bring anyone in yet anyway)

ProdiggaOP•6mo ago

i would stick to legacy, and if legacy is going to be removed assuming the bandwidth costs dont come down by that time then we will have to leave! I know bare metal is around the corner though and I am in talks with some lovely folk at RW about trying it out for some of our bandwidth heavy services. (they've been lovely to deal with) we also have some major bandwidth optimisations coming soon so that will help bring the cost down too! but to give you an idea, if our bandwidth use just tripled then we would be paying about 1500 USD for bandwidth which is, ugh, a LOT for us. It be worth investing effort to port somewhere else with cheaper costs at that point.

Brody•6mo ago

1500usd is nothing

ProdiggaOP•6mo ago

haha what are YOU doing! 😆

Brody•6mo ago

tests to try and reproduce your issue

ProdiggaOP•6mo ago

ah , network test 😉 oh boy, i hope railway slashes that bill for you for helping people out

Brody•6mo ago

conductors get a 100% off coupon

ProdiggaOP•6mo ago

oh well choo choo

Brody•6mo ago

choo choo indeed i will be bringing in char (who i think has the most to do with the v2 runtime) as soon as i feel he's available

ProdiggaOP•6mo ago

no dice with disabling otel, still high bandwidth sweet thanks, but no rush since we can just stick to legacy for now! how curious hey by the way, if i wanted to put together as minimal of a repo as i could, whats an easy way to load test? spin up two services and get them to talk to each toher via public url. my repro service will be a stripped down nestjs rest api that """proxies""" messages

Brody•6mo ago

i just have this

Brody•6mo ago

the service in the middle serves a infinite file (in the sense that the response is null bytes) and the downloader services request a 1gb file from it on a loop and download it at a fixed 5MB/s super controlled environment with no variables other than v2 or legacy

ProdiggaOP•6mo ago

did you just whip up the code for that downloader service yourself?

Brody•6mo ago

yeah

ProdiggaOP•6mo ago

sweet, cool

Brody•6mo ago

all go services

ProdiggaOP•6mo ago

nice, i want to learn go. it seems nice, light, powerful

Brody•6mo ago

indeed it is ill try to talk to char about this when hes in tmr

ProdiggaOP•6mo ago

Which one? A through to Z. You've got a lot to pick from? I'll see myself out

Brody•6mo ago

"this" being the topic of the title for this thread

ProdiggaOP•6mo ago

I just meant because you were going to speak to "char"... You know what never mind. It was a terrible joke haha

Brody•6mo ago

ohhhhh I see what you mean

Gaming

Programming

Should we expect an increase in bandwidth usage with V2 runtime?