R
Railway•16mo ago
fedev

Websocket disconnecting

Hey, I have an app that uses a websocket (hosted on railway) but for some reason every once in a while (2 - 2.5h) the websocket just disconnectes (and then reconnects since i set up a reconnecting websocket) but since there is high traffic, in the second it disconnects i can loose a lot of data. I saw another issue of this type and you guys suggested to use pings (which i'm sending like every 10 seconds both from the websocket and from the app connected to the websocket) and to use the .up.railway.app domain instead of a custom one as written here https://help.railway.app/troubleshooting/hKqw9Dr1moxfDySTy98E6G/websocket-connections-disconnecting/sb6bfjV6UnuMwkRyAvQ3Sb and i'm doing that too. Ill also attach some logs of the client reconnecting to the websocket every 2 or 2.5h
No description
111 Replies
Percy
Percy•16mo ago
Project ID: b16be651-2cf1-4464-91ef-3890dc9c63aa
fedev
fedevOP•16mo ago
b16be651-2cf1-4464-91ef-3890dc9c63aa (websocket service id)
Brody
Brody•16mo ago
this is a websocket connection between a client and your railway service right?
fedev
fedevOP•16mo ago
between a nodejs client and a nodejs websocket server (both on railway)
Brody
Brody•16mo ago
are they in the same project?
fedev
fedevOP•16mo ago
you mean service?
Brody
Brody•16mo ago
I mean project
fedev
fedevOP•16mo ago
no they're not
Brody
Brody•16mo ago
should they be?
fedev
fedevOP•16mo ago
hmm no why?
Brody
Brody•16mo ago
because it seems like they should, they talk to each other after all, and if you put them in the same project, you can use the internal networking
fedev
fedevOP•16mo ago
but its a websocket server it doesent need to be in the same project, it works fine just that sometimes it disconnects
Brody
Brody•16mo ago
also you miss read the help page, it says to use a custom domain instead of the railway domain, you have it the other way around
fedev
fedevOP•16mo ago
oh god...😂 I feel so stupid ahah
Brody
Brody•16mo ago
either way I'm kinda leaning towards this not being an issue with railway your time between disconnects are not fixed times, some 2 hours, some 4 hours, etc so I think you should start logging the disconnect reason, and once you have those error logs, go from there
fedev
fedevOP•16mo ago
will try with custom domain and add some logs and see then 😅
Brody
Brody•16mo ago
also also, I'm pretty sure fp or char told me that it has since been fixed a long time ago
fedev
fedevOP•16mo ago
oh ok
Brody
Brody•16mo ago
so yeah hold off on the custom domain for now log both error and disconnect events on both ends of the websocket
fedev
fedevOP•16mo ago
the only log i can get from websocket close event is the code
fedev
fedevOP•16mo ago
No description
fedev
fedevOP•16mo ago
code 1006
It is designated for use in
applications expecting a status code to indicate that the
connection was closed abnormally, e.g., without sending or
receiving a Close control frame.
It is designated for use in
applications expecting a status code to indicate that the
connection was closed abnormally, e.g., without sending or
receiving a Close control frame.
Brody
Brody•16mo ago
there are error and close event emitters for the websocket connection, you will want to print the reason for error and the reason for disconnect respectively
fedev
fedevOP•16mo ago
im listening to both of them error is not firing, only close
Brody
Brody•16mo ago
you will want to print the reason for error and the reason for disconnect respectively
fedev
fedevOP•16mo ago
but how can i print the error if there is no error...
Brody
Brody•16mo ago
if there was no error then you wouldn't have a disconnect, there is an error, print it
fedev
fedevOP•16mo ago
this.ws.on("close", closeMessage => {
console.log("WEBSOCKET CLOSE", closeMessage); // 1006 WAS LOGGED
if (this.shouldReconnect) {
this.scheduleReconnect();
this.stopPingTimer();
}
});

this.ws.on("error", (error: any) => {
// if (error.code === "ECONNREFUSED") return;
console.log(`${this.nameIdentifier} WebSocket error`, error); // NOTHING WAS LOGGED
});
this.ws.on("close", closeMessage => {
console.log("WEBSOCKET CLOSE", closeMessage); // 1006 WAS LOGGED
if (this.shouldReconnect) {
this.scheduleReconnect();
this.stopPingTimer();
}
});

this.ws.on("error", (error: any) => {
// if (error.code === "ECONNREFUSED") return;
console.log(`${this.nameIdentifier} WebSocket error`, error); // NOTHING WAS LOGGED
});
Brody
Brody•16mo ago
i just cant see how this issue is railways fault, given the fact that the time between disconnects is sporadic, if there where timeouts for websocket connections in place you would see constant time between disconnects like i have said, log the reason, you are only logging the code
fedev
fedevOP•16mo ago
ok then I will see if I can manage to figure it out thank you anyways maybe the problem is that ws.send("ping") instead of ws.ping() how am I not logging the reason ? there is console.log(error) and console.log(closeMessage)
Brody
Brody•16mo ago
you are only logging the code please read the reference docs for the close event https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/close_event
fedev
fedevOP•16mo ago
you mean I'm doing console.log(closeMessage) instead of console.log(closeMessage.reason)?
Brody
Brody•16mo ago
please read the reference docs for the close event https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/close_event
fedev
fedevOP•16mo ago
I have read it and still not understanding what you are saying
Brody
Brody•16mo ago
I don't think my attempts to guide you in the right direction are bearing any fruit. I have other threads from other users to attend to and wish you the best as you debug your issue. Hopefully, it gets resolved! 🙂
fedev
fedevOP•16mo ago
ok sorry, thank you anyway railway
fedev
fedevOP•16mo ago
Hey brody, i'm pretty sure that the websocket disconnecting issue is a problem from railway I have deployed the same websocket server on AWS and 11 hours later (for me it 20:10) the node client still hasen't disconnected from the websocket
No description
Brody
Brody•16mo ago
okay try a custom domain next
fedev
fedevOP•16mo ago
Yes I already tried it yesterday
Brody
Brody•16mo ago
try on fly?
fedev
fedevOP•16mo ago
Wym?
Brody
Brody•16mo ago
fly.io you tried it on aws, now try it on fly.io
fedev
fedevOP•16mo ago
Oh its a hosting platform?
Brody
Brody•16mo ago
yeah
fedev
fedevOP•16mo ago
Ok later will try to host on fly and tomorrow will see if it disconnected then
fedev
fedevOP•16mo ago
I tried with fly and 10h and 30 min later its still connected
No description
fedev
fedevOP•16mo ago
i also retried with railway websocket again just to make sure and less than 2h later it disconnected
No description
Brody
Brody•16mo ago
okay, I'm gonna run a test myself too, if I can reproduce this, I will get the team involved for you
fedev
fedevOP•16mo ago
Ok thanks, for the moment i deployed the websocket server on fly io
Brody
Brody•16mo ago
what would you say the max time you have been able to keep a websocket connection open for on railway is?
fedev
fedevOP•16mo ago
from this screen 9,5h but generally it's 2h
Brody
Brody•16mo ago
well I hope I don't have to get back to you in 9.5 hours I also kind hope I can reproduce this, I'm not sure what I'd say to you if I can't
fedev
fedevOP•16mo ago
not urgent ahah if you want i can provide you the code i'm using
Brody
Brody•16mo ago
all we will be waiting for is my websockets test to disconnect
fedev
fedevOP•16mo ago
yeah
Brody
Brody•16mo ago
can reproduce
No description
No description
Duchess
Duchess•16mo ago
Thread has been flagged to Railway team by @Brody.
Brody
Brody•16mo ago
@char8 - websocket disconnects
char8
char8•15mo ago
Actively looking into this - have an idea of what it could be, so testing that hypothesis. Will update here when I have more.
fedev
fedevOP•15mo ago
Hey any news?
Ray
Ray•15mo ago
Hey, not yet sorry. It may be related to some other network issues we're going to investigate soon The team's heads down on getting regions out atm
fedev
fedevOP•14mo ago
Still no updates on this? 😕
Brody
Brody•14mo ago
well as long as envoy has a memory leak lol I've said this before, but are you sure you don't want to run this communication through the private network?
fedev
fedevOP•14mo ago
Could try it Tho i've never used it, I tried reading docs but don't much understand how to get it working
Brody
Brody•14mo ago
everthing in the same project, then just use internal domains and the port your app runs on when opening the ws connections
fedev
fedevOP•14mo ago
cool managed to connect, forgot to change to ws instead of wss but is it like a guaranteed thing that with private networking it won't disconnect?
Brody
Brody•14mo ago
well theres no proxy with the private network
fedev
fedevOP•14mo ago
ok then for the moment I will use private networking, thank you brody
Ray
Ray•14mo ago
This happens as a side effect of our edge proxy. No short-term fix for now 😦 Sorry for the late follow up, and the bad news on this. We're reworking a major layer on all of this soon
fedev
fedevOP•14mo ago
I understand thank you anyway Thanks
Dayblox
Dayblox•14mo ago
Might just be needing a keepalive ?
Brody
Brody•14mo ago
both ray and char8 have confirmed its not a code/config issue
Dayblox
Dayblox•14mo ago
I meant a keepalive message to avoid inactivity But 9 hours feels lengthy
Brody
Brody•14mo ago
both fedev's app and my app has constant activity, my test had a message every second, so its railway
Dayblox
Dayblox•14mo ago
Alright, sorry I'm caught up now
Brody
Brody•14mo ago
no worries at all
char8
char8•14mo ago
We’ve lowered the envoy restart times to once a week for the edge proxies a few weeks ago. The routing ones still reset once a day, looking into upping that as well (these eat up a fair bit more ram). Hopefully some improvement though 🤞
ThallesComH
ThallesComH•3w ago
yo @Brody is this still an issue? my Interval template suffers from the same thing and their reconnection logic is not that good so sometimes it breaks out of nowhere (totally code's fault).
No description
ThallesComH
ThallesComH•3w ago
im using custom domain and all that jazz, hope its not a problem with Cloudflare
Brody
Brody•3w ago
it no longer an issue if there's still issues, they wouldn't be platform related
ThallesComH
ThallesComH•3w ago
idk what to say, is there anything i could do to make sure its not Railway? maybe host it somewhere else to test it out? some context: i hosted that application for a long time in Railway with their cloudy thingy and never had a problem now cloudy thingy is in Railway and its breaking
Brody
Brody•3w ago
cloudy thing?
ThallesComH
ThallesComH•3w ago
interval.com was a SaaS product then they released a open source version where I can host the SaaS but my code where it interacts with the SaaS product was always in Railway
ThallesComH
ThallesComH•3w ago
hmm this is documented
No description
ThallesComH
ThallesComH•3w ago
maybe tcp proxy doesnt have that issue? im fine with no ssl, better than no application at all
Brody
Brody•3w ago
your railway service is connecting out to interval websocket server?
ThallesComH
ThallesComH•3w ago
yep through public
Brody
Brody•3w ago
not quite how it works, the TCP proxy or the domains on the service has absolutely nothing to do with connecting out to 3rd party services
ThallesComH
ThallesComH•3w ago
? the interval websocket server is hosted on Railway im basically connecting from a Railway service to a Railway service through websocket
Brody
Brody•3w ago
are you using the private network?
ThallesComH
ThallesComH•3w ago
no the service is in a separate project
Brody
Brody•3w ago
why not in the same project?
ThallesComH
ThallesComH•3w ago
its a general purpose service i could move it yeah but i would have to put all of our future services into that same project
Brody
Brody•3w ago
fair well we have a completely different proxy from when the issue was originally opened
ThallesComH
ThallesComH•3w ago
yeah i guess it was Envoy? i remember someone saying that Envoy was kinda limiting Railway
Brody
Brody•3w ago
yes it used to be envoy envoy does not serve any more traffic at this point
ThallesComH
ThallesComH•3w ago
but yeah idk what to do, i'll try tcp proxy to see if it works wish there was some kinda of guide for connecting different projects through Tailscale, maybe thats possible
Brody
Brody•3w ago
TCP proxy and http proxy are the same btw the HTTP proxy is a wrapper on top of the TCP proxy
ThallesComH
ThallesComH•3w ago
well the guide mentions http/1.1
Brody
Brody•3w ago
docs are outdated
ThallesComH
ThallesComH•3w ago
and tcp proxy is supposed to be more reliable IMO as most of clients doesnt handle TCP reconnections that well sad times
Brody
Brody•3w ago
how often are you seeing disconnections?
ThallesComH
ThallesComH•3w ago
let me see seems to be pretty random tbh
ThallesComH
ThallesComH•3w ago
No description
Brody
Brody•3w ago
yikes if our proxy was that unstable, and with just under half a million domains, we would know about it
ThallesComH
ThallesComH•3w ago
sometimes it seems to be 5-6 hours and then when a disconnection happens, it seems to reconnect a few times until it stabilizes but again, i could try to host the SaaS thingy into another place to see if drops connection constantly
Brody
Brody•3w ago
have you checked the http logs to see what they say?
ThallesComH
ThallesComH•2w ago
yep, nothing that caught attention i'll disable Cloudflare to see if that helps it was cloudflare 💀 i could probably enable cloudflare and use a secret railway domain for the public communication security through obscurity but what can i do
Brody
Brody•2w ago
and you doubted me
Brody
Brody•2w ago
accepted
Want results from more Discord servers?
Add your server