It seems like websockets just stops working?
Hi, I've been running web sockets with two separate libraries and it works for a certain amount of time, but then just stops receiving events after a while.
I originally thought this was a lib issue or the provider I am using for data was not sending the event, but the common thread seems to be Railway.
Is there some way to confirm this?
38 Replies
Project ID:
bdcc0a4b-7e77-4c0a-9a5f-9099d793cead
bdcc0a4b-7e77-4c0a-9a5f-9099d793cead
Looks like this is the issue: https://help.railway.app/troubleshooting/hKqw9Dr1moxfDySTy98E6G/websocket-connections-disconnecting/sb6bfjV6UnuMwkRyAvQ3Sb
Railway | Help Center
WebSocket Connections Disconnecting – Railway | Help Center
Steps to take to make sure that WebSockets are behaving as expected on the platform.
can you be more specific about how long you are able to keep a websocket connection open for, that article is very old and not completely accurate
I am testing two different libraries with web sockets since I thought it was the library. Based on the logs, it seems like 24 hours or less.
then yes 24 hours would be the max time, that article was talking about a max time of around a few hours, the current limitation is 24 hours
you would want to have your websocket connections reconnect on disconnect
also, using a railway domain or a custom doesn't doesn't change anything anymore like that article said
Ah okay! I'll remove it then
@char8 - another report of the proxy restarts effecting a user
Okay I'll update my code tomorrow and give it a try
question, the communication between the websocket server and client, are these two railway services? what do you have going on?
Server on Railway is listening for websocket events from Alchemy, so waiting for infrequent blockchain events
so you aren't running a websocket server yourself, just a client?
That's right. I'm not running it and just running a client
then I don't think you'd be touching railways proxy
is this maybe a limitation of alchemy's websocket server?
That's what I originally thought, so I also tested it with infura
Similar results
I assume infura is a similar service as alchemy's?
Yeah that's right
have you tested your code on a different platform so you can rule out your code as a factor?
Only tested it on Railway
would you be up to testing it on fly perhaps?
It's a non-zero probability that its my code, but I see it working in teh beginning
Fly has too much friction lol
I moved away from fly
I can definitely try to reconnect tomorrow and see what happens
haha you aren't wrong
does this websocket connection have ping pongs?
It might... I have to check
Not sure if the library exposes that to me
Actually i dont think the websocket servers provide that
I know fly doesn't have the best DX, but from what I can tell, they do have a more stable networking setup (sorry char 😦 ), so if you could please run your code for 24 hours on fly so we can rule out any issues with your code or the platform
I'll consider it, but I'm going to try the reconnect suggestion first
well of course that would work
but that's just hiding a potential problem
Yeah could be! I'm doubtful its the code though cuz it's a very simple copypasta.
I'll let you know if I end up trying fly and report back what i find.
okay!
this sounds like something to ask infura/alchemy. Those would be normal outbound connections from our standpoint and we don't intercept traffic there.
Websocket ping/pongs are a good first step (prevents an intermediate idle timeout killing the connection) and identifying zombie connections, but you should always implement reconnect.
Assuming those folks run cloud infra, you'll see connections drop out whenever they cycle their proxy boxes / scale their fleet / rebalance connections. If it's a low freq. channel, the risk of missing events is hopefully low provided the reconnect is fast. For a high freq. stream you'd usually expect the API to provide some from of resume key so you can resume the stream from where you left off.
I actually asked Alchemy and they ran a test with my code and events and couldn't reproduce it. They suggested it might be my host which is why I opened a ticket here.
haha did they run a test for more than 24 hours though
yeah
That's what they said, at least. I asked for their logs and they couldn't share it.
well id still like you to try on fly
for lack of a better word, you need prove this is railways fault
Got it
Btw. I spent 2 hours trying to get it into fly.io and gave up. fly.io doesn't like my dockerfiles. I couldn't launch my other elixir app in there either before I found Railway.
Do you have any other suggestions for hosts I can just spin up easily via Dockerfile?
fly.io should automatically use a Dockerfile same as railway
either way, do you have high frequency data coming across this websocket connection?
Naw the data is very infrequent
Couldn't get it deployed, honestly. Ran into a lot of deploying issues with their cmd line tool.
okay then just do the reconnect on disconnect and call it a day
trying to do it for the brand man