Don't auto-enable the new proxy. It's bad for my app and I purposefully leave it off!
One of my consumer apps (hosted on another platform) gets hanging TCP connections to 35.212.XX.XX:443 and eventually throws ETIMEDOUT whenever the edge proxy is enabled.
50 Replies
Project ID:
af684ae5-8a4b-4930-9d59-6884623336d2
af684ae5-8a4b-4930-9d59-6884623336d2
My app is stable without it , there's no reason whatsoever why I'd want this thing enabled. Can I please opt out of the feature?
What are you hosting?
And no- we've messaged the migration that we are moving off of the legacy proxy 9 times. We are only maintaining the new proxy moving forward.
additionally, the new proxy has always been used for TCP ever since the platform introduced TCP support.
A regular nestjs app
My consumer app hits it up multiple times
whenever I use the new proxy, the connection to this ip 35.212.XX.XX:443 hangs
and doesn't return a response
eventually it times out
and my consumer app breaks
because it expects 100% uptime from my app on railway
turning off the edge proxy solves this instantly
Gotcha- this is for TCP connections?
For regular HTTPS requests
But yeah, it’s a tcp connection at the end of the day
Have you considered using the TCP proxy?
Hmm not really. I don’t see the need for it though, unless your edge proxy is unstable somehow
Which frankly seems to be the case since the issue only arises when the option is enabled
It could be, or it could be Envoy doing something masking a packet that's malformed.
Who’s envoy
:HUH:
The previous proxy
Anyway, we won't cut everyone over until we have fixed everything in the new one.
However, slamming that switch back won't stop the inevitable, so glad you reported this.
Yeah please look into it. Every time it gets re-enabled the app breaks 😭
I would encourage you to use the TCP proxy for your 443 connection, and see if that helps.
If it resolves that way- then we might have to do something like a TCP -> HTTPS bridge
I’m not even doing a long lived connection though—I’m using the custom dns I assigned on railway
Oh brother
Horrors beyond my comprehension
Go on 🙂
And the app calls the api from there
It’s a namecheap basicdns that handles the custom domain
And the timeout happens after the dns is resolved ( the asn for 35.212.xx.xx is gcp)
So highly doubt it’s the issue
:hmmBusiness:
Should I be looking for this somewhere or should I just spin up a proxy server?
No need- you can use the setting under network
Wondering why you are running your own local DNS?
Its namecheap basicdns
are you aware that Railway has a internal network for you for service to service comms?
Not my own local dns
okok
usecase?
Yeah but the consumer app is on another cloud
which?
Digitalocean
You could use Tailscale to bridge the two networks
But!
I expose api endpoints to some customers
:deadge:
I dont want extra complexity ser
Lets leave you on legacy for a day or two
and we can look into this and fix the core bug
!t
Okok thanks man
New reply sent from Help Station thread:
This thread has been escalated to the Railway team.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Bump, no time left in the day.You're seeing this because this thread has been automatically linked to the Help Station thread. New reply sent from Help Station thread:
Will get to this during handoff.You're seeing this because this thread has been automatically linked to the Help Station thread.
cmon guys
connect ETIMEDOUT 35.212.94.98:443
:tired:
seems I'll just have to move to another hosting platform lol, everytime I check my app is broken due to this option going back onwe would really appreciate it if you could provide a minimal reproducible example, because as it stands no one has yet to report anything similar, aka we have nothing to go off of to fix this
I mean honestly I don't even know how to get started on that. My API still works, but when the new edge proxies a higher % of requests will hang and fail to the point where they don't even reach my server
can you tell me where you are making these requests from, and what exactly is making the request?
are you chaining multiple services together? if so i would use open telemtry (or whatever alternative you use) and see the response time from service -> service on edge and not on edge
on the internal network it shouldnt matter but sometimes the edge was slower for me on external
@Brilew - Update on this, considering we only have a single report of this issue across our nearly half a million domains, we would need a reproducible example here, otherwise we will be proceeding with the migration and removing the option to switch back to legacy.
@blank (revived) and I get this TIMEOUT error too, we reported it in another thread a long time ago and our fix was to stick to Legacy network, though that option seems to be removed now. Since that option is removed, our semi working solution is to use http proxies when making requests from another hosting platform to Railway.
This happens when sending a lot of requests from a single IP to Railway. It was never a problem in Legacy, not sure why it is now.
Note: even with 5k proxies rotating, we still get timeout errors. Not often but it’ll come here and there
do you happen to know how many RPS you are doing?
No more than 10 RPS
where are you making requests from?
PhoenixNAP
is this reproducible? (without proxies)
here's a quick GPT'd code that mimics what we're doing:
i may have exaggerated on 10 RPS, more like no more than 3 RPS
you'd prob have to run this all day/week to see something
and in this scenario this code is running on NAP?
yes
well unfortunately theres nothing we can do here without a reproducible example
considering OP is running on DO, i don't think it's provider specific so I guess localhost would work too
unless Railway checks the quality of IPs? Idk if DO or NAP IPs* are flagged in any way (again, no issues on Legacy)
if i have to run something for a week to see anything happen that would not qualify as reproducible
send more reqs then
sorry but no, going to close this out, feel free to open another thread if you can come up with a reproducible example.