Websocket disconnecting
Hey, I have an app that uses a websocket (hosted on railway) but for some reason every once in a while (2 - 2.5h) the websocket just disconnectes (and then reconnects since i set up a reconnecting websocket) but since there is high traffic, in the second it disconnects i can loose a lot of data.
I saw another issue of this type and you guys suggested to use pings (which i'm sending like every 10 seconds both from the websocket and from the app connected to the websocket) and to use the .up.railway.app domain instead of a custom one as written here https://help.railway.app/troubleshooting/hKqw9Dr1moxfDySTy98E6G/websocket-connections-disconnecting/sb6bfjV6UnuMwkRyAvQ3Sb and i'm doing that too.
Ill also attach some logs of the client reconnecting to the websocket every 2 or 2.5h
111 Replies
Project ID:
b16be651-2cf1-4464-91ef-3890dc9c63aa
b16be651-2cf1-4464-91ef-3890dc9c63aa
(websocket service id)
this is a websocket connection between a client and your railway service right?
between a nodejs client and a nodejs websocket server (both on railway)
are they in the same project?
you mean service?
I mean project
no they're not
should they be?
hmm no
why?
because it seems like they should, they talk to each other after all, and if you put them in the same project, you can use the internal networking
but its a websocket server it doesent need to be in the same project, it works fine just that sometimes it disconnects
also you miss read the help page, it says to use a custom domain instead of the railway domain, you have it the other way around
oh god...😂
I feel so stupid ahah
either way I'm kinda leaning towards this not being an issue with railway
your time between disconnects are not fixed times, some 2 hours, some 4 hours, etc
so I think you should start logging the disconnect reason, and once you have those error logs, go from there
will try with custom domain and add some logs and see then 😅
also also, I'm pretty sure fp or char told me that it has since been fixed a long time ago
oh ok
so yeah hold off on the custom domain for now
log both
error
and disconnect
events on both ends of the websocketthe only log i can get from websocket close event is the code
code 1006
there are
error
and close
event emitters for the websocket connection, you will want to print the reason for error and the reason for disconnect respectivelyim listening to both of them
error is not firing, only close
you will want to print the reason for error and the reason for disconnect respectively
but how can i print the error if there is no error...
if there was no error then you wouldn't have a disconnect, there is an error, print it
i just cant see how this issue is railways fault, given the fact that the time between disconnects is sporadic, if there where timeouts for websocket connections in place you would see constant time between disconnects
like i have said, log the reason, you are only logging the code
ok then I will see if I can manage to figure it out thank you anyways
maybe the problem is that
ws.send("ping")
instead of ws.ping()
how am I not logging the reason ? there is console.log(error) and console.log(closeMessage)you are only logging the code
please read the reference docs for the close event
https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/close_event
you mean I'm doing
console.log(closeMessage)
instead of console.log(closeMessage.reason)
?please read the reference docs for the close event https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/close_event
I have read it
and still not understanding what you are saying
I don't think my attempts to guide you in the right direction are bearing any fruit. I have other threads from other users to attend to and wish you the best as you debug your issue. Hopefully, it gets resolved! 🙂
ok sorry, thank you anyway
Hey brody, i'm pretty sure that the websocket disconnecting issue is a problem from railway
I have deployed the same websocket server on AWS and 11 hours later (for me it 20:10) the node client still hasen't disconnected from the websocket
okay try a custom domain next
Yes I already tried it yesterday
try on fly?
Wym?
fly.io
you tried it on aws, now try it on fly.io
Oh its a hosting platform?
yeah
Ok later will try to host on fly and tomorrow will see if it disconnected then
I tried with fly and 10h and 30 min later its still connected
i also retried with railway websocket again just to make sure and less than 2h later it disconnected
okay, I'm gonna run a test myself too, if I can reproduce this, I will get the team involved for you
Ok thanks, for the moment i deployed the websocket server on fly io
what would you say the max time you have been able to keep a websocket connection open for on railway is?
from this screen 9,5h but generally it's 2h
well I hope I don't have to get back to you in 9.5 hours
I also kind hope I can reproduce this, I'm not sure what I'd say to you if I can't
not urgent ahah
if you want i can provide you the code i'm using
all we will be waiting for is my websockets test to disconnect
yeah
can reproduce
Thread has been flagged to Railway team by @Brody.
@char8 - websocket disconnects
Actively looking into this - have an idea of what it could be, so testing that hypothesis. Will update here when I have more.
Hey any news?
Hey, not yet sorry. It may be related to some other network issues we're going to investigate soon
The team's heads down on getting regions out atm
Still no updates on this? 😕
well as long as envoy has a memory leak lol
I've said this before, but are you sure you don't want to run this communication through the private network?
Could try it
Tho i've never used it, I tried reading docs but don't much understand how to get it working
everthing in the same project, then just use internal domains and the port your app runs on when opening the ws connections
cool managed to connect, forgot to change to ws instead of wss
but is it like a guaranteed thing that with private networking it won't disconnect?
well theres no proxy with the private network
ok then for the moment I will use private networking, thank you brody
This happens as a side effect of our edge proxy. No short-term fix for now 😦
Sorry for the late follow up, and the bad news on this. We're reworking a major layer on all of this soon
I understand thank you anyway
Might just be needing a keepalive ?
both ray and char8 have confirmed its not a code/config issue
I meant a keepalive message to avoid inactivity
But 9 hours feels lengthy
both fedev's app and my app has constant activity, my test had a message every second, so its railway
Alright, sorry I'm caught up now
no worries at all
We’ve lowered the envoy restart times to once a week for the edge proxies a few weeks ago. The routing ones still reset once a day, looking into upping that as well (these eat up a fair bit more ram). Hopefully some improvement though 🤞
yo @Brody is this still an issue? my Interval template suffers from the same thing and their reconnection logic is not that good so sometimes it breaks out of nowhere (totally code's fault).
im using custom domain and all that jazz, hope its not a problem with Cloudflare
it no longer an issue
if there's still issues, they wouldn't be platform related
idk what to say, is there anything i could do to make sure its not Railway? maybe host it somewhere else to test it out?
some context:
i hosted that application for a long time in Railway with their cloudy thingy and never had a problem
now cloudy thingy is in Railway and its breaking
cloudy thing?
interval.com was a SaaS product
then they released a open source version where I can host the SaaS
but my code where it interacts with the SaaS product was always in Railway
hmm this is documented
maybe tcp proxy doesnt have that issue? im fine with no ssl, better than no application at all
your railway service is connecting out to interval websocket server?
yep
through public
not quite how it works, the TCP proxy or the domains on the service has absolutely nothing to do with connecting out to 3rd party services
? the interval websocket server is hosted on Railway
im basically connecting from a Railway service to a Railway service through websocket
are you using the private network?
no
the service is in a separate project
why not in the same project?
its a general purpose service
i could move it yeah but i would have to put all of our future services into that same project
fair
well we have a completely different proxy from when the issue was originally opened
yeah i guess it was Envoy? i remember someone saying that Envoy was kinda limiting Railway
yes it used to be envoy
envoy does not serve any more traffic at this point
but yeah idk what to do, i'll try tcp proxy to see if it works
wish there was some kinda of guide for connecting different projects through Tailscale, maybe thats possible
TCP proxy and http proxy are the same btw
the HTTP proxy is a wrapper on top of the TCP proxy
well the guide mentions http/1.1
docs are outdated
and tcp proxy is supposed to be more reliable IMO as most of clients doesnt handle TCP reconnections that well
sad times
how often are you seeing disconnections?
let me see
seems to be pretty random tbh
yikes
if our proxy was that unstable, and with just under half a million domains, we would know about it
sometimes it seems to be 5-6 hours
and then when a disconnection happens, it seems to reconnect a few times until it stabilizes
but again, i could try to host the SaaS thingy into another place to see if drops connection constantly
have you checked the http logs to see what they say?
yep, nothing that caught attention
i'll disable Cloudflare to see if that helps
it was cloudflare 💀
i could probably enable cloudflare and use a secret railway domain for the public communication
security through obscurity but what can i do
and you doubted me
accepted