R
Railwayā€¢6mo ago
Prodigga

Should we expect an increase in bandwidth usage with V2 runtime?

Recently (amongst many other changes..!) swapped our API gateway to V2. And we saw a large increase in bandwidth. Is this expected? Swapping back to legacy "fixed" it. I can investigate further and try to produce a "clean" repro (turn on V2, redeploy, run for a few hours, and revert) if this is not expected behaviour. My other services showed similar bandwidth increases, however the data there is much noisier due to all the other changes I was making at the time. Project ID 4c3b4b0e-006a-407e-90c7-9c3031cd622f The image shows the window of time we had V2 runtime enabled
No description
57 Replies
Percy
Percyā€¢6mo ago
Project ID: 4c3b4b0e-006a-407e-90c7-9c3031cd622f
Brody
Brodyā€¢6mo ago
I would very much appreciate if you could come up with a clean way to reproduce this
Prodigga
ProdiggaOPā€¢6mo ago
im running a test today, will get back with result. i've just swapped our gateway over to v2 again. ill run it for a bit and show a comparison
Brody
Brodyā€¢6mo ago
im not too sure if a in use app is a very good reproducible example
Prodigga
ProdiggaOPā€¢6mo ago
i dont have time to put in any more effort, unfortunately. I am the only programmer at my studio and i am spread too thin. thats why we use platforms like railway! so ALL I changed was the runtime to V2 on our app. nothing else was changed. I clicked V2, got prompted to deploy the changes, I hit ok and thats all
Prodigga
ProdiggaOPā€¢6mo ago
s
No description
Prodigga
ProdiggaOPā€¢6mo ago
our bandwidth use just doubles when we enable v2 runtime, along with estiamte bill for the month etc i am now reverting the change ($$$$) and can report back if and when the bandwidth drops back in line project id is 4c3b4b0e-006a-407e-90c7-9c3031cd622f and the service in question is 3545427b-d98c-42ec-b5ac-f9cc4326e3c4 if any railway dev wants to poke around and investigate i guess its more than double..! almost triple šŸ™‚
Brody
Brodyā€¢6mo ago
i created an example project, with 3 services, 2 services to download a file on a loop with a fixed download size and download speed, and the other service to serve the file, one of the download services used the legacy runtime, and the other used the v2 runtime. i am unable to reproduce, in fact the v2 runtime uses a tiny bit less network
Prodigga
ProdiggaOPā€¢6mo ago
thanks for trying to reproduce it!
Brody
Brodyā€¢6mo ago
ignore the large bumps, i was dialing in the settings as to not rack up a massive bill
No description
Prodigga
ProdiggaOPā€¢6mo ago
i dont know what it might be. but from what i understand legacy will eventually be disabled and we will be pushed onto v2. and v2 is supposed to 'just work' with no changes right? Its not something we need to concern our selves with? nice yeah, do you have any idea what it might be? perhaps regions are involved? we host on US East
Brody
Brodyā€¢6mo ago
i dont think its just the v2 runtime, with your app there are many other factors at play
Prodigga
ProdiggaOPā€¢6mo ago
maybe private traffic is being counted incorrectly in v2 runtime when your region is not the default though again, with this one change, it will more than triple our bandwidth bill , and if its something thats just supposed to work then i think its something that railway may want to investigate before pushing it to their users?
Brody
Brodyā€¢6mo ago
private traffic shouldnt be counted at all, regardless of region
Prodigga
ProdiggaOPā€¢6mo ago
i know
Brody
Brodyā€¢6mo ago
but good idea well v2 is the default for all new services
Prodigga
ProdiggaOPā€¢6mo ago
yeah I noticed - i recently split the responsibilites a DIFFERENT service (in the same project) in 2. the service was doing 2 jobs at once, essentially. A rest API and a socketio/realtime comms service (chat, etc). I basically added a switch to make the service act as one or the other, because i wanted to get a good idea how much of our bandiwdth bill was coming from the socketio/realtime stuff vs rest api external database queries so anyway, that service was using LEGACY (its been around for a while) i split it into two, made the existing service into Rest API only, and the NEW service I made into the socketio/realtime service.. the NEW service was automatically v2 runtime bandwidth usage was HUGE again, like a 3x jump in normal usage i eventually figured out the v2 switch was '''''to blame''''' set it to Legacy and now old service + new service bandwidth = old combined service bnadwidth, as expeted
Brody
Brodyā€¢6mo ago
didnt you say that websocket connections failed on the v2 runtime, or was that the edge proxy?
Prodigga
ProdiggaOPā€¢6mo ago
no the websocket connections failed in 'edge proxy' (Though I may have messed up my words, sorry - i was knee deep in a bunch of problems when i was debugging all that, as you can tell)
Brody
Brodyā€¢6mo ago
i ran my test with the edge proxy on, im going to disable that and try again
Prodigga
ProdiggaOPā€¢6mo ago
the gateway has edge proxy enabled! (the one from the test today) my current config is: Gateway: Edge Proxy ON, Runtime: Legacy Rest Api: Edge Proxy ON, Runtime: Legacy SocketIO: Edge Proxy OFF, Runtime: Legacy
Brody
Brodyā€¢6mo ago
what service are these graphs from?
Prodigga
ProdiggaOPā€¢6mo ago
the graphs are from API Gateway here is the moments before i SPLIT my restapi+socket io service into TWO the other day
Prodigga
ProdiggaOPā€¢6mo ago
No description
Prodigga
ProdiggaOPā€¢6mo ago
the purple lines are Socketio/rest api services (you can see where I split it into two (two purple lines) and enabled v2 runtime
Brody
Brodyā€¢6mo ago
did you ever get any errors from the socketio service when you said the edge proxy wasnt working for you?
Prodigga
ProdiggaOPā€¢6mo ago
and the YELLOW is my api gateway where i ALSO enabled v2 runtiem at the same time then swiftly revereted v2 -> legacy, and you can see my traffic back to expected levels -- where API gateway (yellow) looks usual, and the two purple lines 'add up' to approx what the traffic was before the split i have not investigated that yet, i am not sure when i will have a chance to at the moment, i will open a separate help thread for that if i can confirm that Edge Proxy -> ON just breaks my Socket IO functionality just trying to think of what else is 'strange' about my setup, but, mm, the region being different is the only thing i can think of that isn't "stock standard". the nodejs app is just a nestjs app. especially the api gateway one is VERY straightforward and simple. it just proxies requests to one of 3 (dev/stg/prd) servers (using internal url) based on some headers in the request. it exposes a health check endpoint. it also exposes an endpoint to query info about the 3 servers. finally, it has a redis client connection to receive updates about changes to those 3 servers (rare occurance. once a week or two when I push out an update to the game)
Prodigga
ProdiggaOPā€¢6mo ago
API gateway, just now, 20 mins after reverting back to Legacy:
No description
Brody
Brodyā€¢6mo ago
and just to be clear, the service works just fine on the v2 rutime right?
Prodigga
ProdiggaOPā€¢6mo ago
yeah! OH also just remembered we also have an OTEL collector that my server is reporting its data too (again, internal) so the start command for the api gateway is actually node --require '@opentelemetry/auto-instrumentations-node/register' dist/apps/ssr-api-gateway/main to do all the opentel auto instrument stuff i can test with it disabled + v2 maybe
Brody
Brodyā€¢6mo ago
are you sure there arent errors anywhere, and something is going into a retry loop and bloating the bandwidth?
Prodigga
ProdiggaOPā€¢6mo ago
as a side node, for the window of time we were running on V2 runtime, the API gateways response time was great and so stable šŸ˜›
No description
Prodigga
ProdiggaOPā€¢6mo ago
possible? I can never discount it i guess? but it would have to be in response to a client request. no one else pokes this server. just requests from clients in the game. and the request is then proxied and the response sent back but logs are clean, and I DO get errors when proxies fail in other cases
Brody
Brodyā€¢6mo ago
then that rules that out
Prodigga
ProdiggaOPā€¢6mo ago
just double checking logs yeah nothing suspecious no errors in the last 5 hours, except for when i reverted back to legacy and redeployed šŸ˜› ill try v2 runtime w/o otel instrumentation, just in case :3HC_Shrug:
Brody
Brodyā€¢6mo ago
hypothetically, what would happen if railway isnt able to determine the cause of your increased network? (its still the weekend, i cant bring anyone in yet anyway)
Prodigga
ProdiggaOPā€¢6mo ago
i would stick to legacy, and if legacy is going to be removed assuming the bandwidth costs dont come down by that time then we will have to leave! I know bare metal is around the corner though and I am in talks with some lovely folk at RW about trying it out for some of our bandwidth heavy services. (they've been lovely to deal with) we also have some major bandwidth optimisations coming soon so that will help bring the cost down too! but to give you an idea, if our bandwidth use just tripled then we would be paying about 1500 USD for bandwidth which is, ugh, a LOT for us. It be worth investing effort to port somewhere else with cheaper costs at that point.
Brody
Brodyā€¢6mo ago
1500usd is nothing
No description
Prodigga
ProdiggaOPā€¢6mo ago
haha what are YOU doing! šŸ˜†
Brody
Brodyā€¢6mo ago
tests to try and reproduce your issue
Prodigga
ProdiggaOPā€¢6mo ago
ah , network test šŸ˜‰ oh boy, i hope railway slashes that bill for you for helping people out
Brody
Brodyā€¢6mo ago
conductors get a 100% off coupon
Prodigga
ProdiggaOPā€¢6mo ago
oh well choo choo
Brody
Brodyā€¢6mo ago
choo choo indeed i will be bringing in char (who i think has the most to do with the v2 runtime) as soon as i feel he's available
Prodigga
ProdiggaOPā€¢6mo ago
no dice with disabling otel, still high bandwidth sweet thanks, but no rush since we can just stick to legacy for now! how curious hey by the way, if i wanted to put together as minimal of a repo as i could, whats an easy way to load test? spin up two services and get them to talk to each toher via public url. my repro service will be a stripped down nestjs rest api that """proxies""" messages
Brody
Brodyā€¢6mo ago
i just have this
No description
Brody
Brodyā€¢6mo ago
the service in the middle serves a infinite file (in the sense that the response is null bytes) and the downloader services request a 1gb file from it on a loop and download it at a fixed 5MB/s super controlled environment with no variables other than v2 or legacy
Prodigga
ProdiggaOPā€¢6mo ago
did you just whip up the code for that downloader service yourself?
Brody
Brodyā€¢6mo ago
yeah
Prodigga
ProdiggaOPā€¢6mo ago
sweet, cool
Brody
Brodyā€¢6mo ago
all go services
Prodigga
ProdiggaOPā€¢6mo ago
nice, i want to learn go. it seems nice, light, powerful
Brody
Brodyā€¢6mo ago
indeed it is ill try to talk to char about this when hes in tmr
Prodigga
ProdiggaOPā€¢6mo ago
Which one? A through to Z. You've got a lot to pick from? I'll see myself out
Brody
Brodyā€¢6mo ago
"this" being the topic of the title for this thread
Prodigga
ProdiggaOPā€¢6mo ago
I just meant because you were going to speak to "char"... You know what never mind. It was a terrible joke haha
Brody
Brodyā€¢6mo ago
ohhhhh I see what you mean
Want results from more Discord servers?
Add your server